Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12754

innodb.truncate_debug fails in buildbot with embedded due to semaphore wait

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.2(EOL)
    • 10.2.6
    • Tests
    • None

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/8062/steps/test_1/logs/stdio

      innodb.truncate_debug 'innodb'           w2 [ fail ]
              Test ended at 2017-05-08 21:58:39
       
      CURRENT_TEST: innodb.truncate_debug
      Warning: /mnt/buildbot/build/mariadb-10.2.6/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-ca=/mnt/buildbot/build/mariadb-10.2.6/mysql-test/std_data/cacert.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.6/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-cert=/mnt/buildbot/build/mariadb-10.2.6/mysql-test/std_data/client-cert.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.6/libmysqld/examples/mysqltest_embedded: unknown variable 'loose-ssl-key=/mnt/buildbot/build/mariadb-10.2.6/mysql-test/std_data/client-key.pem'
      Warning: /mnt/buildbot/build/mariadb-10.2.6/libmysqld/examples/mysqltest_embedded: unknown option '--loose-skip-ssl'
      --Thread 2845211456 has waited at btr0sea.ic line 114 for 241.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0xb88a628 created in file btr0sea.cc line 193
      a writer (thread id 2845211456) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 0, lock_word: ffffffff
      Last time read locked in file btr0sea.ic line 152
      Last time write locked in file not yet reserved line 0
      --Thread 2845211456 has waited at btr0sea.ic line 114 for 241.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0xb88a628 created in file btr0sea.cc line 193
      a writer (thread id 2845211456) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 0, lock_word: ffffffff
      Last time read locked in file btr0sea.ic line 152
      Last time write locked in file not yet reserved line 0
      --Thread 2963266368 has waited at srv0srv.cc line 2175 for 240.00 seconds the semaphore:
      Mutex at 0xb9003d8, Mutex DICT_SYS created dict0dict.cc:1101, lock var 2
      Last time reserved in file /home/buildbot/buildbot/build/mariadb-10.2.6/storage/innobase/btr/btr0sea.cc line 313
      InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
      InnoDB: Pending reads 0, writes 0
       
      =====================================
      2017-05-08 21:57:45 0xb11feb40 INNODB MONITOR OUTPUT
      =====================================
      Per second averages calculated from the last 16 seconds
      -----------------
      BACKGROUND THREAD
      -----------------
      srv_master_thread loops: 2 srv_active, 0 srv_shutdown, 9 srv_idle
      srv_master_thread log flush and writes: 10
      ----------
      SEMAPHORES
      ----------
      -------------
      RW-LATCH INFO
      -------------
      RW-LOCK: 0xb6016758 
      Locked: thread 2845510464 file dict0dict.ic line 1138  X-LOCK
      RW-LOCK: 0xb6009d78 
      Locked: thread 2845510464 file dict0dict.ic line 1138  X-LOCK
      RW-LOCK: 0xb6009518 
      Locked: thread 2845510464 file dict0dict.ic line 1138  X-LOCK
      RW-LOCK: 0xb900368 
      Locked: thread 2963266368 file srv0srv.cc line 2173  X-LOCK
      RW-LOCK: 0xb88ae78 
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      RW-LOCK: 0xb88ad48 
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      RW-LOCK: 0xb88ac18 
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      RW-LOCK: 0xb88aae8 
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      RW-LOCK: 0xb88a9b8 
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      RW-LOCK: 0xb88a888 
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      RW-LOCK: 0xb88a758 
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      RW-LOCK: 0xb88a628 
      Locked: thread 2845211456 file btr0sea.ic line 114  WAIT X-LOCK
      Locked: thread 2845510464 file btr0sea.ic line 152  S-LOCK
      Total number of rw-locks 1084
      OS WAIT ARRAY INFO: reservation count 165
      --Thread 2845211456 has waited at btr0sea.ic line 114 for 250.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0xb88a628 created in file btr0sea.cc line 193
      a writer (thread id 2845211456) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 0, lock_word: ffffffff
      Last time read locked in file btr0sea.ic line 152
      Last time write locked in file not yet reserved line 0
      --Thread 2963266368 has waited at srv0srv.cc line 2175 for 249.00 seconds the semaphore:
      Mutex at 0xb9003d8, Mutex DICT_SYS created dict0dict.cc:1101, lock var 2
      Last time reserved in file /home/buildbot/buildbot/build/mariadb-10.2.6/storage/innobase/btr/btr0sea.cc line 313
      OS WAIT ARRAY INFO: signal count 117
      RW-shared spins 0, rounds 21, OS waits 9
      RW-excl spins 0, rounds 34, OS waits 5
      RW-sx spins 0, rounds 0, OS waits 0
      Spin rounds per wait: 21.00 RW-shared, 34.00 RW-excl, 0.00 RW-sx
      ------------
      TRANSACTIONS
      ------------
      Trx id counter 1341
      Purge done for trx's n:o < 1341 undo n:o < 0 state: running but idle
      History list length 7
      Total number of lock structs in row lock hash table 0
      LIST OF TRANSACTIONS FOR EACH SESSION:
      ---TRANSACTION 18446744072480630052, not started
      0 lock struct(s), heap size 1096, 0 row lock(s)
      ---TRANSACTION 1319, ACTIVE 260 sec truncating table
      mysql tables in use 1, locked 1
      0 lock struct(s), heap size 1096, 0 row lock(s)
      MySQL thread id 7, OS thread handle 0x0, query id 2845510464 localhost root debug sync point: buffer_pool_scan
      truncate table t1
      --------
      FILE I/O
      --------
      I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
      I/O thread 1 state: waiting for completed aio requests (log thread)
      I/O thread 2 state: waiting for completed aio requests (read thread)
      I/O thread 3 state: waiting for completed aio requests (read thread)
      I/O thread 4 state: waiting for completed aio requests (write thread)
      I/O thread 5 state: waiting for completed aio requests (write thread)
      Pending normal aio reads: [0, 0] , aio writes: [0, 0] ,
       ibuf aio reads:, log i/o's:, sync i/o's:
      Pending flushes (fsync) log: 0; buffer pool: 0
      190 OS file reads, 287 OS file writes, 63 OS fsyncs
      0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
      -------------------------------------
      INSERT BUFFER AND ADAPTIVE HASH INDEX
      -------------------------------------
      Ibuf: size 1, free list len 0, seg size 2, 3 merges
      merged operations:
       insert 0, delete mark 0, delete 0
      discarded operations:
       insert 0, delete mark 0, delete 0
      InnoDB: ###### Diagnostic info printed to the standard error stream
      --Thread 2845211456 has waited at btr0sea.ic line 114 for 272.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0xb88a628 created in file btr0sea.cc line 193
      a writer (thread id 2845211456) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffff
      Last time read locked in file btr0sea.ic line 152
      Last time write locked in file not yet reserved line 0
      --Thread 2963266368 has waited at srv0srv.cc line 2175 for 271.00 seconds the semaphore:
      Mutex at 0xb9003d8, Mutex DICT_SYS created dict0dict.cc:1101, lock var 2
      Last time reserved in file /home/buildbot/buildbot/build/mariadb-10.2.6/storage/innobase/btr/btr0sea.cc line 313
      --Thread 2845211456 has waited at btr0sea.ic line 114 for 272.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0xb88a628 created in file btr0sea.cc line 193
      a writer (thread id 2845211456) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffff
      Last time read locked in file btr0sea.ic line 152
      Last time write locked in file not yet reserved line 0
      --Thread 2963266368 has waited at srv0srv.cc line 2175 for 271.00 seconds the semaphore:
      Mutex at 0xb9003d8, Mutex DICT_SYS created dict0dict.cc:1101, lock var 2
      Last time reserved in file /home/buildbot/buildbot/build/mariadb-10.2.6/storage/innobase/btr/btr0sea.cc line 313
      --Thread 2971659072 has waited at srv0srv.cc line 1367 for 22.00 seconds the semaphore:
      S-lock on RW-latch at 0xb88a628 created in file btr0sea.cc line 193
      a writer (thread id 2845211456) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 1, lock_word: ffffffff
      Last time read locked in file btr0sea.ic line 152
      Last time write locked in file not yet reserved line 0
      InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
      InnoDB: Pending reads 0, writes 0
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      0.00 hash searches/s, 0.19 non-hash searches/s
      ---
      LOG
      ---
      Log sequence number 1689903
      Log flushed up to   1689409
      Pages flushed up to 1689409
      Last checkpoint at  1689233
      0 pending log flushes, 0 pending chkp writes
      43 log i/o's done, 0.04 log i/o's/second
      ----------------------
      BUFFER POOL AND MEMORY
      ----------------------
      Total large memory allocated 8634368
      Dictionary memory allocated 56639
      Buffer pool size   511
      Free buffers       164
      Database pages     347
      Old database pages 0
      Modified db pages  2
      Percent of dirty pages(LRU & free pages): 0.391
      Max dirty pages percent: 75.000
      Pending reads 0
      Pending writes: LRU 0, flush list 0, single page 0
      Pages made young 0, not young 0
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 174, created 184, written 230
      0.00 reads/s, 0.05 creates/s, 0.00 writes/s
      Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 347, unzip_LRU len: 0
      I/O sum[0]:cur[0], unzip sum[0]:cur[0]
      --------------
      ROW OPERATIONS
      --------------
      0 queries inside InnoDB, 0 queries in queue
      0 read views open inside InnoDB
      Process ID=11083, Main thread ID=2963266368, state: sleeping
      Number of rows inserted 63, updated 0, deleted 0, read 66
      0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
      Number of system rows inserted 0, updated 0, deleted 0, read 0
      0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
      ----------------------------
      END OF INNODB MONITOR OUTPUT
      ============================
      --- /mnt/buildbot/build/mariadb-10.2.6/mysql-test/suite/innodb/r/truncate_debug.result	2017-05-08 19:10:08.000000000 +0300
      +++ /mnt/buildbot/build/mariadb-10.2.6/mysql-test/suite/innodb/r/truncate_debug.reject	2017-05-08 21:58:26.000000000 +0300
      @@ -77,6 +77,8 @@
       SET DEBUG_SYNC= 'now SIGNAL finish_scan';
       SET DEBUG_SYNC= 'RESET';
       connection con1;
      +Warnings:
      +Warning	1639	debug sync point wait timed out
       disconnect con1;
       connection con2;
       disconnect con2;
       
      mysqltest: Result length mismatch
       
       
      =====================================
      2017-05-08 21:58:26 0xb11feb40 INNODB MONITOR OUTPUT
      =====================================
      Per second averages calculated from the last 41 seconds
      -----------------
      BACKGROUND THREAD
      -----------------
      srv_master_thread loops: 2 srv_active, 0 srv_shutdown, 9 srv_idle
      srv_master_thread log flush and writes: 11
      ----------
      SEMAPHORES
      ----------
      -------------
      RW-LATCH INFO
      -------------
      Total number of rw-locks 1070
      OS WAIT ARRAY INFO: reservation count 178
      OS WAIT ARRAY INFO: signal count 127
      RW-shared spins 0, rounds 40, OS waits 18
      RW-excl spins 0, rounds 128, OS waits 9
      RW-sx spins 0, rounds 0, OS waits 0
      Spin rounds per wait: 40.00 RW-shared, 128.00 RW-excl, 0.00 RW-sx
      ------------
      TRANSACTIONS
      ------------
      Trx id counter 1354
      Purge done for trx's n:o < 1354 undo n:o < 0 state: exited
      History list length 0
      Total number of lock structs in row lock hash table 0
      LIST OF TRANSACTIONS FOR EACH SESSION:
      --------
      FILE I/O
      --------
      I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
      I/O thread 1 state: waiting for completed aio requests (log thread)
      I/O thread 2 state: waiting for completed aio requests (read thread)
      I/O thread 3 state: waiting for completed aio requests (read thread)
      I/O thread 4 state: waiting for completed aio requests (write thread)
      I/O thread 5 state: waiting for completed aio requests (write thread)
      Pending normal aio reads: [0, 0] , aio writes: [0, 0] ,
       ibuf aio reads:, log i/o's:, sync i/o's:
      Pending flushes (fsync) log: 0; buffer pool: 0
      190 OS file reads, 297 OS file writes, 73 OS fsyncs
      0.00 reads/s, 0 avg bytes/read, 0.24 writes/s, 0.24 fsyncs/s
      -------------------------------------
      INSERT BUFFER AND ADAPTIVE HASH INDEX
      -------------------------------------
      Ibuf: size 1, free list len 0, seg size 2, 7 merges
      merged operations:
       insert 0, delete mark 0, delete 0
      discarded operations:
       insert 0, delete mark 0, delete 0
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      Hash table size 4421, used cells 0, node heap has 0 buffer(s)
      0.00 hash searches/s, 2.88 non-hash searches/s
      ---
      LOG
      ---
      Log sequence number 1703290
      Log flushed up to   1702639
      Pages flushed up to 1689903
      Last checkpoint at  1689233
      0 pending log flushes, 0 pending chkp writes
      50 log i/o's done, 7.00 log i/o's/second
      ----------------------
      BUFFER POOL AND MEMORY
      ----------------------
      Total large memory allocated 8634368
      Dictionary memory allocated 39367
      Buffer pool size   511
      Free buffers       157
      Database pages     354
      Old database pages 0
      Modified db pages  33
      Percent of dirty pages(LRU & free pages): 6.445
      Max dirty pages percent: 75.000
      Pending reads 0
      Pending writes: LRU 0, flush list 0, single page 0
      Pages made young 0, not young 0
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 174, created 191, written 230
      0.00 reads/s, 6.99 creates/s, 0.00 writes/s
      Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 354, unzip_LRU len: 0
      I/O sum[0]:cur[0], unzip sum[0]:cur[0]
      --------------
      ROW OPERATIONS
      --------------
      0 queries inside InnoDB, 0 queries in queue
      0 read views open inside InnoDB
      Process ID=11083, Main thread ID=2963266368, state: sleeping
      Number of rows inserted 63, updated 0, deleted 0, read 66
      0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
      Number of system rows inserted 0, updated 0, deleted 0, read 0
      0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
      ----------------------------
      END OF INNODB MONITOR OUTPUT
      ============================
      InnoDB: ###### Diagnostic info printed to the standard error stream
      

      Attachments

        Issue Links

          Activity

            It seems to me that a DEBUG_SYNC signal is lost. Here is how I reached this conclusion. First, I started the test:

            ./mtr --mem --parallel=auto --repeat=20 innodb.truncate_debug{,,,}
            

            Then I monitored the output, looking for a hang:

            innodb.truncate_debug 'innodb'           w1 [ pass ]   9366
            innodb.truncate_debug 'innodb'           w3 [ pass ]   9338
            innodb.truncate_debug 'innodb'           w4 [ pass ]   9344
            innodb.truncate_debug 'innodb'           w2 [ pass ]   9350
            innodb.truncate_debug 'innodb'           w1 [ pass ]   9325
            innodb.truncate_debug 'innodb'           w2 [ pass ]   9329
            innodb.truncate_debug 'innodb'           w2 [ pass ]   9261
            innodb.truncate_debug 'innodb'           w2 [ pass ]   9227
            innodb.truncate_debug 'innodb'           w2 [ pass ]   9209
            

            At this point, it looks like w1, w3, w4 are hung. I opened var/4/mysqld.1.err and got the process ID. Then I attached GDB to the running process and ran this:

            thread apply all backtrace
            kill
            

            This identified the hang inside the test:

            innodb.truncate_debug 'innodb'           w4 [ fail ]
                    Test ended at 2017-05-10 10:14:34
             
            CURRENT_TEST: innodb.truncate_debug
            mysqltest: At line 109: query 'reap' failed: 2013: Lost connection to MySQL server during query
            

            In the stack trace output, I saw the following waits:

            #9  0x0000000001fae530 in btr_search_x_lock_all ()
                at /mariadb/10.2/storage/innobase/include/btr0sea.ic:114
            #10 0x0000000001faebe7 in btr_search_disable (need_mutex=true)
                at /mariadb/10.2/storage/innobase/btr/btr0sea.cc:317
            #11 0x000000000199e307 in innodb_adaptive_hash_index_update (
                thd=0x62a0000ba208, var=0x3f05da0 <mysql_sysvar_adaptive_hash_index>, 
                var_ptr=0x404a480 <btr_search_enabled>, save=0x60b0000085e8)
                at /mariadb/10.2/storage/innobase/handler/ha_innodb.cc:18816
            

            and

            #4  0x0000000000abeff6 in debug_sync (thd=0x62a0000c0208, 
                sync_point_name=0x3217aa0 <.str.93> "buffer_pool_scan", name_len=16)
                at /mariadb/10.2/sql/debug_sync.cc:1552
            #5  0x0000000002165305 in fil_reinit_space_header (id=8, size=8, 
                trx=0x7fb08cd97528) at /mariadb/10.2/storage/innobase/fil/fil0fil.cc:3332
            #6  0x0000000001d6944a in row_truncate_table_for_mysql (table=0x618000032508, 
                trx=0x7fb08cd97528)
                at /mariadb/10.2/storage/innobase/row/row0trunc.cc:2021
            

            The issue here is that a DEBUG_SYNC signal was already sent to terminate the wait at the DEBUG_SYNC point buffer_pool_scan:

            connect (con1,localhost,root,,);
            SET DEBUG_SYNC= 'buffer_pool_scan SIGNAL started WAIT_FOR finish_scan';
            send truncate table t1;
            #… snip much stuff
            connect (con2,localhost,root,,);
            send set global innodb_adaptive_hash_index=off;
             
            connection default;
            SET DEBUG_SYNC= 'now SIGNAL finish_scan';
            SET DEBUG_SYNC= 'RESET';
             
            connection con1;
            reap;
            

            The DEBUG_SYNC was assigned synchronously, so it should have been delivered. Yet, the TRUNCATE TABLE remains stuck.

            The test is designed so that con2 will be blocked until the SIGNAL finish_scan allows TRUNCATE TABLE to continue and to eventually release the btr_search_latches[].
            This should happen by the time that the TRUNCATE result is reaped in con1.

            I suspect that MariaDB is missing the MySQL fix BUG#14765080 DEBUG_SYNC CAN LOSE SIGNAL (and some follow-up fixes). Its commit comment is referring to a hidden test i_innodb.innodb_bug14712710 that I wrote for the MySQL 5.6 Bug#14712710 BOGUS DUPLICATE KEY ERROR DURING ONLINE ALTER TABLE. The test starts 6 auxiliary client connections, running one online ALTER TABLE in each of them (or possibly running some concurrent DML in each of them). If I remember correctly, the DEBUG_SYNC communication between the default connection and these "worker connections" consists of a signal being sent at the start and at the end of the work. Sometimes, a DEBUG_SYNC signal would be lost at the end, causing test failure.

            marko Marko Mäkelä added a comment - It seems to me that a DEBUG_SYNC signal is lost. Here is how I reached this conclusion. First, I started the test: ./mtr --mem --parallel=auto --repeat=20 innodb.truncate_debug{,,,} Then I monitored the output, looking for a hang: innodb.truncate_debug 'innodb' w1 [ pass ] 9366 innodb.truncate_debug 'innodb' w3 [ pass ] 9338 innodb.truncate_debug 'innodb' w4 [ pass ] 9344 innodb.truncate_debug 'innodb' w2 [ pass ] 9350 innodb.truncate_debug 'innodb' w1 [ pass ] 9325 innodb.truncate_debug 'innodb' w2 [ pass ] 9329 innodb.truncate_debug 'innodb' w2 [ pass ] 9261 innodb.truncate_debug 'innodb' w2 [ pass ] 9227 innodb.truncate_debug 'innodb' w2 [ pass ] 9209 At this point, it looks like w1, w3, w4 are hung. I opened var/4/mysqld.1.err and got the process ID. Then I attached GDB to the running process and ran this: thread apply all backtrace kill This identified the hang inside the test: innodb.truncate_debug 'innodb' w4 [ fail ] Test ended at 2017-05-10 10:14:34   CURRENT_TEST: innodb.truncate_debug mysqltest: At line 109: query 'reap' failed: 2013: Lost connection to MySQL server during query In the stack trace output, I saw the following waits: #9 0x0000000001fae530 in btr_search_x_lock_all () at /mariadb/10.2/storage/innobase/include/btr0sea.ic:114 #10 0x0000000001faebe7 in btr_search_disable (need_mutex=true) at /mariadb/10.2/storage/innobase/btr/btr0sea.cc:317 #11 0x000000000199e307 in innodb_adaptive_hash_index_update ( thd=0x62a0000ba208, var=0x3f05da0 <mysql_sysvar_adaptive_hash_index>, var_ptr=0x404a480 <btr_search_enabled>, save=0x60b0000085e8) at /mariadb/10.2/storage/innobase/handler/ha_innodb.cc:18816 and #4 0x0000000000abeff6 in debug_sync (thd=0x62a0000c0208, sync_point_name=0x3217aa0 <.str.93> "buffer_pool_scan", name_len=16) at /mariadb/10.2/sql/debug_sync.cc:1552 #5 0x0000000002165305 in fil_reinit_space_header (id=8, size=8, trx=0x7fb08cd97528) at /mariadb/10.2/storage/innobase/fil/fil0fil.cc:3332 #6 0x0000000001d6944a in row_truncate_table_for_mysql (table=0x618000032508, trx=0x7fb08cd97528) at /mariadb/10.2/storage/innobase/row/row0trunc.cc:2021 The issue here is that a DEBUG_SYNC signal was already sent to terminate the wait at the DEBUG_SYNC point buffer_pool_scan: connect (con1,localhost,root,,); SET DEBUG_SYNC= 'buffer_pool_scan SIGNAL started WAIT_FOR finish_scan'; send truncate table t1; #… snip much stuff connect (con2,localhost,root,,); send set global innodb_adaptive_hash_index=off;   connection default; SET DEBUG_SYNC= 'now SIGNAL finish_scan'; SET DEBUG_SYNC= 'RESET';   connection con1; reap; The DEBUG_SYNC was assigned synchronously, so it should have been delivered. Yet, the TRUNCATE TABLE remains stuck. The test is designed so that con2 will be blocked until the SIGNAL finish_scan allows TRUNCATE TABLE to continue and to eventually release the btr_search_latches[]. This should happen by the time that the TRUNCATE result is reaped in con1. I suspect that MariaDB is missing the MySQL fix BUG#14765080 DEBUG_SYNC CAN LOSE SIGNAL (and some follow-up fixes). Its commit comment is referring to a hidden test i_innodb.innodb_bug14712710 that I wrote for the MySQL 5.6 Bug#14712710 BOGUS DUPLICATE KEY ERROR DURING ONLINE ALTER TABLE. The test starts 6 auxiliary client connections, running one online ALTER TABLE in each of them (or possibly running some concurrent DML in each of them). If I remember correctly, the DEBUG_SYNC communication between the default connection and these "worker connections" consists of a signal being sent at the start and at the end of the work. Sometimes, a DEBUG_SYNC signal would be lost at the end, causing test failure.

            At a glance looks like a dead lock between mutex and rwlock. Unfortunately there is no information in this dump about which thread holds DICT_SYS mutex.

            svoj Sergey Vojtovich added a comment - At a glance looks like a dead lock between mutex and rwlock. Unfortunately there is no information in this dump about which thread holds DICT_SYS mutex.

            oh, if there's DEBUG_SYNC involved... it may be culprit of course

            svoj Sergey Vojtovich added a comment - oh, if there's DEBUG_SYNC involved... it may be culprit of course

            svoj, the problem is that fil_reinit_space_header() is stuck waiting for a DEBUG_SYNC signal that was already sent by the default connection in the test:

            	DEBUG_SYNC_C("buffer_pool_scan");
            	buf_LRU_flush_or_remove_pages(id, BUF_REMOVE_ALL_NO_WRITE, 0);
            	btr_search_s_unlock_all();
            

            The lost DEBUG_SYNC signal is blocking the btr_search_x_lock_all() call for con2 (set global innodb_adaptive_hash_index=off).
            As far as I understand, the test is expected to cause this blocking, but normally the buf_LRU_flush_or_remove_pages() would be allowed to execute, and it would complete so fast that no "long semaphore wait" is being reported.

            marko Marko Mäkelä added a comment - svoj , the problem is that fil_reinit_space_header() is stuck waiting for a DEBUG_SYNC signal that was already sent by the default connection in the test: DEBUG_SYNC_C("buffer_pool_scan"); buf_LRU_flush_or_remove_pages(id, BUF_REMOVE_ALL_NO_WRITE, 0); btr_search_s_unlock_all(); The lost DEBUG_SYNC signal is blocking the btr_search_x_lock_all() call for con2 (set global innodb_adaptive_hash_index=off). As far as I understand, the test is expected to cause this blocking, but normally the buf_LRU_flush_or_remove_pages() would be allowed to execute, and it would complete so fast that no "long semaphore wait" is being reported.

            The test appears to fail randomly.
            Here is a test failure on non-embedded (debug sync point wait timed out).

            marko Marko Mäkelä added a comment - The test appears to fail randomly. Here is a test failure on non-embedded (debug sync point wait timed out).

            The initial fix of cleaning up wrong DEBUG_SYNC usage did not make the problem go fully away.
            There are sporadic failures of the test innodb.truncate_debug in 10.2 and especially 10.3, which I believe cannot be due to anything else than a lost DEBUG_SYNC signal. (See my earlier comments.)

            marko Marko Mäkelä added a comment - The initial fix of cleaning up wrong DEBUG_SYNC usage did not make the problem go fully away. There are sporadic failures of the test innodb.truncate_debug in 10.2 and especially 10.3, which I believe cannot be due to anything else than a lost DEBUG_SYNC signal. (See my earlier comments.)
            elenst Elena Stepanova added a comment - - edited

            In regard to innodb.truncate_debug, statistics for 10.3 and 10.2 are essentially different.
            On 10.3-based trees, the test is failing with this "debug sync point wait timed out" diagnostics very frequently, many times a day, on various debug builders (fulltest, p8, windows).

            However, on 10.2 the last recorder failure for the test if of 2017-06-29 (and a few more times on 10.2-based trees which were apparently rebased long time ago); so, whatever was causing it there, must have been fixed since then. It could be a different problem altogether, since it was mostly failing on Windows, and the frequency was also much less than on 10.3.

            Back to 10.3, the failure is reproducible rather easily by re-running the sequence of tests, and further analysis suggests that it isn't caused by a lost signal, and in general "debug sync point wait timed out" here is a rather misleading error. So, I suggest to keep this item closed and instead re-purpose MDEV-13256 for this new failure. I'll re-target it for 10.3 and put further details in it.

            elenst Elena Stepanova added a comment - - edited In regard to innodb.truncate_debug, statistics for 10.3 and 10.2 are essentially different. On 10.3-based trees, the test is failing with this "debug sync point wait timed out" diagnostics very frequently, many times a day, on various debug builders (fulltest, p8, windows). However, on 10.2 the last recorder failure for the test if of 2017-06-29 (and a few more times on 10.2-based trees which were apparently rebased long time ago); so, whatever was causing it there, must have been fixed since then. It could be a different problem altogether, since it was mostly failing on Windows, and the frequency was also much less than on 10.3. Back to 10.3, the failure is reproducible rather easily by re-running the sequence of tests, and further analysis suggests that it isn't caused by a lost signal, and in general "debug sync point wait timed out" here is a rather misleading error. So, I suggest to keep this item closed and instead re-purpose MDEV-13256 for this new failure. I'll re-target it for 10.3 and put further details in it.

            People

              serg Sergei Golubchik
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.