Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24448

mariabackup: storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.

Details

    Description

      mariabackup.apply-log-only-incr 'innodb' w19 [ 2 pass ]  65328
      mariabackup.apply-log-only-incr 'innodb' w8 [ 2 fail ]
              Test ended at 2020-12-17 19:37:54
       
      CURRENT_TEST: mariabackup.apply-log-only-incr
      ...
      /data/Server_bin/10.6C_debug/bin/mariabackup based on MariaDB server 10.6.0-MariaDB Linux (x86_64)
      [00] 2020-12-17 19:36:51 incremental backup from 62952 is enabled.
      [00] 2020-12-17 19:36:51 cd to /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup/
      [00] 2020-12-17 19:36:56 This target seems to be already prepared.
      [00] 2020-12-17 19:36:57 mariabackup: using the following InnoDB configuration for recovery:
      [00] 2020-12-17 19:36:57 innodb_data_home_dir = .
      [00] 2020-12-17 19:36:57 innodb_data_file_path = ibdata1:12M:autoextend
      [00] 2020-12-17 19:36:57 innodb_log_group_home_dir = /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1/
      [00] 2020-12-17 19:36:57 InnoDB: Using Linux native AIO
      [00] 2020-12-17 19:36:57 mariabackup: Generating a list of tablespaces
      [00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//ibdata1.delta is 16384 bytes
      [00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//ibdata1.delta to ./ibdata1...
      [00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//test/t.ibd.delta is 16384 bytes
      [00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//test/t.ibd.delta to ./test/t.ibd...
      [00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/transaction_registry.ibd.delta is 16384 bytes
      [00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/transaction_registry.ibd.delta to ./mysql/transaction_registry.ibd...
      [00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_table_stats.ibd.delta is 16384 bytes
      [00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_table_stats.ibd.delta to ./mysql/innodb_table_stats.ibd...
      [00] 2020-12-17 19:36:57 page size for /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_index_stats.ibd.delta is 16384 bytes
      [00] 2020-12-17 19:36:57 Applying /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1//mysql/innodb_index_stats.ibd.delta to ./mysql/innodb_index_stats.ibd...
      [00] 2020-12-17 19:36:57 mariabackup: using the following InnoDB configuration for recovery:
      [00] 2020-12-17 19:36:57 innodb_data_home_dir = .
      [00] 2020-12-17 19:36:57 innodb_data_file_path = ibdata1:12M:autoextend
      [00] 2020-12-17 19:36:57 innodb_log_group_home_dir = /data/Server_bin/10.6C_debug/mysql-test/var/8/tmp/backup_inc1/
      [00] 2020-12-17 19:36:57 InnoDB: Using Linux native AIO
      [00] 2020-12-17 19:36:57 Starting InnoDB instance for recovery.
      [00] 2020-12-17 19:36:57 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
      2020-12-17 19:36:57 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
      2020-12-17 19:36:57 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2020-12-17 19:36:57 0 [Note] InnoDB: Number of pools: 1
      2020-12-17 19:36:57 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2020-12-17 19:36:57 0 [Warning] InnoDB: Linux Native AIO disabled.
      2020-12-17 19:36:57 0 [Note] InnoDB: Initializing buffer pool, total size = 104857600, chunk size = 104857600
      2020-12-17 19:36:57 0 [Note] InnoDB: Completed initialization of buffer pool
      2020-12-17 19:36:57 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=62995,67272
      FILE_CHECKPOINT(62995) read at 67272
      FILE_CHECKPOINT(62995) reread at 67272
      2020-12-17 19:36:57 0 [Note] InnoDB: Starting final batch to recover 4 pages from redo log.
      2020-12-17 19:36:57 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      2020-12-17 19:36:57 0 [Note] InnoDB: Applying log to page [page id: space=0, page number=312]
      2020-12-17 19:36:57 0 [Note] InnoDB: Applying log to page [page id: space=0, page number=313]
      2020-12-17 19:36:57 0 [Note] InnoDB: Applying log to page [page id: space=6, page number=3]
      2020-12-17 19:36:59 0 [Note] InnoDB: Applying log to page [page id: space=0, page number=51]
      mariabackup: /Server/10.6C/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.
       
      #11 <signal handler called>
      #12 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #13 0x0000319e1b0b9859 in __GI_abort () at abort.c:79
      #14 0x0000319e1b0b9729 in __assert_fail_base (fmt=0x319e1b24f588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x564864f8dc66 "!buf_pool.any_io_pending()", file=0x564864f8dbc0 "/Server/10.6C/storage/innobase/srv/srv0start.cc", line=1582, function=<optimized out>) at assert.c:92
      #15 0x0000319e1b0caf36 in __GI___assert_fail (assertion=0x564864f8dc66 "!buf_pool.any_io_pending()", file=0x564864f8dbc0 "/Server/10.6C/storage/innobase/srv/srv0start.cc", line=1582, function=0x564864f8e7a9 "dberr_t srv_start(bool)") at assert.c:101
      #16 0x000056486481d5d6 in srv_start (create_new_db=false) at /Server/10.6C/storage/innobase/srv/srv0start.cc:1582
      #17 0x0000564863ba1544 in innodb_init () at /Server/10.6C/extra/mariabackup/xtrabackup.cc:2175
      #18 0x0000564863bac6be in xtrabackup_prepare_func (argv=0x51906fb7d308) at /Server/10.6C/extra/mariabackup/xtrabackup.cc:5739
      #19 0x0000564863baf094 in main_low (argv=0x51906fb7d308) at /Server/10.6C/extra/mariabackup/xtrabackup.cc:6643
      #20 0x0000564863bae8a7 in main (argc=5, argv=0x7ffe969c70e8) at /Server/10.6C/extra/mariabackup/xtrabackup.cc:6435
       
      origin/bb-10.6-MDEV-24426, origin/HEAD, origin/10.6, 10.6 c36a2a0d1c03900796dc35c01a745bec8b1b54e2 2020-12-17T14:56:08+02:00
      
      

      Attachments

        Issue Links

          Activity

            pluto:/data/Server_bin/10.6C_debug/mysql-test/rr_trace_dir
            _RR_TRACE_DIR="." rr replay --mark-stdio -p 1026397
            

            mleich Matthias Leich added a comment - pluto:/data/Server_bin/10.6C_debug/mysql-test/rr_trace_dir _RR_TRACE_DIR="." rr replay --mark-stdio -p 1026397

            I was concurrently filing MDEV-24449 for this same issue.

            marko Marko Mäkelä added a comment - I was concurrently filing MDEV-24449 for this same issue.

            At the time of the assertion failure, we have the following thread executing. It would have decremented the buf_pool.n_pend_reads from 1 to 0. Apparently, no log needed to be applied to any of the 4 pages for which redo log existed during this run (the pages were already up to date).

            10.6 c36a2a0d1c03900796dc35c01a745bec8b1b54e2

            #3  0x00005648646d259f in inline_mysql_mutex_unlock (that=0x5648655b0380 <recv_sys>, src_file=0x564864f17af0 "/Server/10.6C/storage/innobase/log/log0recv.cc", src_line=2515) at /Server/10.6C/include/mysql/psi/mysql_thread.h:831
            #4  0x00005648646db2f2 in recv_recover_page (space=0x51906fbab318, bpage=0xbc0250f52a0) at /Server/10.6C/storage/innobase/log/log0recv.cc:2515
            #5  0x000056486452a746 in buf_page_read_complete (bpage=0xbc0250f52a0, node=...) at /Server/10.6C/storage/innobase/buf/buf0buf.cc:4068
            #6  0x00005648645a8936 in fil_aio_callback (request=...) at /Server/10.6C/storage/innobase/fil/fil0fil.cc:3366
            #7  0x0000564864700820 in io_callback (cb=0x51906fc30050) at /Server/10.6C/storage/innobase/os/os0file.cc:3879
            #8  0x000056486491aa2f in tpool::simulated_aio::simulated_aio_callback (param=0x51906fc30050) at /Server/10.6C/tpool/aio_simulated.cc:162
            

            The execution of recv_recover_page() is practically over:

            	mysql_mutex_unlock(&recv_sys.mutex);
            	ut_ad(mtr.has_committed());
            }
            

            At the end of the calling function buf_page_read_complete(), we have the following code:

              if (recv_recovery_is_on())
                recv_recover_page(node.space, bpage);
             
              if (bpage->state() == BUF_BLOCK_FILE_PAGE && !recv_no_ibuf_operations &&
                  (!id.space() || !is_predefined_tablespace(id.space())) &&
                  fil_page_get_type(frame) == FIL_PAGE_INDEX &&
                  page_is_leaf(frame))
                bpage->ibuf_exist= true;
             
              if (UNIV_UNLIKELY(MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE)))
                buf_page_monitor(bpage, BUF_IO_READ);
              DBUG_PRINT("ib_buf", ("read page %u:%u",
                                    id.space(), id.page_no()));
             
              /* Because this thread which does the unlocking might not be the same that
              did the locking, we use a pass value != 0 in unlock, which simply
              removes the newest lock debug record, without checking the thread id. */
              if (bpage->state() == BUF_BLOCK_FILE_PAGE)
              {
                buf_block_t *block= reinterpret_cast<buf_block_t*>(bpage);
                block->lock.x_unlock(true);
              }
              bpage->io_unfix();
             
              ut_d(auto n=) buf_pool.n_pend_reads--;
              ut_ad(n > 0);
              buf_pool.stat.n_pages_read++;
             
              return DB_SUCCESS;
            }
            

            Contrary to what I initially wrote in MDEV-24449, I no longer think that this could result in redo log apply being skipped. The function recv_sys_t::apply() will wait for recv_sys.pages.empty() to hold. Any other potential bad effect (in the code quoted above) seems unlikely. Any concurrent access to the block (for allocating it for something else) should be prevented by the exclusive page latch that is being held until the end. Even a concurrent call to buf_pool_invalidate() would not seem to pose a real problem.

            I was unable to reproduce this failure on my local system, on either 10.5 or 10.6, with or without the fix, using 25Ă—100 repetitions. On buildbot.askmonty.org, the failures have not occurred before the custom InnoDB mutex implementation was eliminated (MDEV-21452). The failure seems to be very sensitive to timing and to compilation options.

            marko Marko Mäkelä added a comment - At the time of the assertion failure, we have the following thread executing. It would have decremented the buf_pool.n_pend_reads from 1 to 0. Apparently, no log needed to be applied to any of the 4 pages for which redo log existed during this run (the pages were already up to date). 10.6 c36a2a0d1c03900796dc35c01a745bec8b1b54e2 #3 0x00005648646d259f in inline_mysql_mutex_unlock (that=0x5648655b0380 <recv_sys>, src_file=0x564864f17af0 "/Server/10.6C/storage/innobase/log/log0recv.cc", src_line=2515) at /Server/10.6C/include/mysql/psi/mysql_thread.h:831 #4 0x00005648646db2f2 in recv_recover_page (space=0x51906fbab318, bpage=0xbc0250f52a0) at /Server/10.6C/storage/innobase/log/log0recv.cc:2515 #5 0x000056486452a746 in buf_page_read_complete (bpage=0xbc0250f52a0, node=...) at /Server/10.6C/storage/innobase/buf/buf0buf.cc:4068 #6 0x00005648645a8936 in fil_aio_callback (request=...) at /Server/10.6C/storage/innobase/fil/fil0fil.cc:3366 #7 0x0000564864700820 in io_callback (cb=0x51906fc30050) at /Server/10.6C/storage/innobase/os/os0file.cc:3879 #8 0x000056486491aa2f in tpool::simulated_aio::simulated_aio_callback (param=0x51906fc30050) at /Server/10.6C/tpool/aio_simulated.cc:162 The execution of recv_recover_page() is practically over: mysql_mutex_unlock(&recv_sys.mutex); ut_ad(mtr.has_committed()); } At the end of the calling function buf_page_read_complete() , we have the following code: if (recv_recovery_is_on()) recv_recover_page(node.space, bpage);   if (bpage->state() == BUF_BLOCK_FILE_PAGE && !recv_no_ibuf_operations && (!id.space() || !is_predefined_tablespace(id.space())) && fil_page_get_type(frame) == FIL_PAGE_INDEX && page_is_leaf(frame)) bpage->ibuf_exist= true ;   if (UNIV_UNLIKELY(MONITOR_IS_ON(MONITOR_MODULE_BUF_PAGE))) buf_page_monitor(bpage, BUF_IO_READ); DBUG_PRINT( "ib_buf" , ( "read page %u:%u" , id.space(), id.page_no()));   /* Because this thread which does the unlocking might not be the same that did the locking, we use a pass value != 0 in unlock, which simply removes the newest lock debug record, without checking the thread id. */ if (bpage->state() == BUF_BLOCK_FILE_PAGE) { buf_block_t *block= reinterpret_cast <buf_block_t*>(bpage); block->lock.x_unlock( true ); } bpage->io_unfix();   ut_d(auto n=) buf_pool.n_pend_reads--; ut_ad(n > 0); buf_pool.stat.n_pages_read++;   return DB_SUCCESS; } Contrary to what I initially wrote in MDEV-24449 , I no longer think that this could result in redo log apply being skipped. The function recv_sys_t::apply() will wait for recv_sys.pages.empty() to hold. Any other potential bad effect (in the code quoted above) seems unlikely. Any concurrent access to the block (for allocating it for something else) should be prevented by the exclusive page latch that is being held until the end. Even a concurrent call to buf_pool_invalidate() would not seem to pose a real problem. I was unable to reproduce this failure on my local system, on either 10.5 or 10.6, with or without the fix, using 25Ă—100 repetitions. On buildbot.askmonty.org, the failures have not occurred before the custom InnoDB mutex implementation was eliminated ( MDEV-21452 ). The failure seems to be very sensitive to timing and to compilation options.

            I made one more attempt of reproducing the assertion failure on 10.5, but it failed to fail for me even with this help:

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index fe691a4b256..4d669472970 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -2480,7 +2480,9 @@ void recv_recover_page(fil_space_t* space, buf_page_t* bpage)
             
             	mtr.commit();
             func_exit:
            +	const bool sleep= recv_sys.pages.empty();
             	mutex_exit(&recv_sys.mutex);
            +	if (sleep) os_thread_sleep(1000);
             	ut_ad(mtr.has_committed());
             }
             
            @@ -2610,7 +2612,7 @@ void recv_sys_t::apply(bool last_batch)
                 if (abort)
                   return;
             
            -    os_thread_sleep(500000);
            +    os_thread_sleep(1000);
                 mutex_enter(&mutex);
               }
             
            @@ -2709,7 +2711,7 @@ void recv_sys_t::apply(bool last_batch)
             
                   if (abort)
                     return;
            -      os_thread_sleep(500000);
            +      os_thread_sleep(1000);
                   mutex_enter(&mutex);
                 }
               }
            

            marko Marko Mäkelä added a comment - I made one more attempt of reproducing the assertion failure on 10.5, but it failed to fail for me even with this help: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index fe691a4b256..4d669472970 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2480,7 +2480,9 @@ void recv_recover_page(fil_space_t* space, buf_page_t* bpage) mtr.commit(); func_exit: + const bool sleep= recv_sys.pages.empty(); mutex_exit(&recv_sys.mutex); + if (sleep) os_thread_sleep(1000); ut_ad(mtr.has_committed()); } @@ -2610,7 +2612,7 @@ void recv_sys_t::apply(bool last_batch) if (abort) return; - os_thread_sleep(500000); + os_thread_sleep(1000); mutex_enter(&mutex); } @@ -2709,7 +2711,7 @@ void recv_sys_t::apply(bool last_batch) if (abort) return; - os_thread_sleep(500000); + os_thread_sleep(1000); mutex_enter(&mutex); } }

            One more test joins the club:

            10.6 30dc4287ec3d46bae7593f56383b9f3738e3c4e6

            CURRENT_TEST: mariabackup.system_versioning
            mysqltest: At line 14: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup  --prepare --target-dir=/mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/4/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11
            …
            2020-12-19  9:38:01 0 [Note] InnoDB: Starting final batch to recover 6 pages from redo log.
            mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.
            

            marko Marko Mäkelä added a comment - One more test joins the club: 10.6 30dc4287ec3d46bae7593f56383b9f3738e3c4e6 CURRENT_TEST: mariabackup.system_versioning mysqltest: At line 14: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup --prepare --target-dir=/mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/4/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11 … 2020-12-19 9:38:01 0 [Note] InnoDB: Starting final batch to recover 6 pages from redo log. mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.

            While this was not repeatable on 10.5, I pushed a fix there as well.

            marko Marko Mäkelä added a comment - While this was not repeatable on 10.5, I pushed a fix there as well.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.