Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27868

buf_pool.flush_list is in the wrong order

Details

    Description

      In MDEV-27774, the log_sys.mutex and log_sys.flush_order_mutex were replaced with a shared log_sys.latch. This means that concurrently executing mtr_t::commit() may insert records to the buf_pool.flush_list roughly concurrently. Each insertion is still protected by buf_pool.flush_list_mutex.

      In buf_pool_t::insert_into_flush_list() we attempted to compensate for this by not unconditionally inserting blocks first in buf_pool.flush_list but by searching for an appropriate insert position. This compensation does not appear to be working at all times. With the following command, I was able to reproduce a crash on my system:

      ./mtr --parallel=auto --repeat=100 encryption.innodb_encryption_filekeys{,,,,}{,,,}
      

      10.8 8251a9fb93075a72074bd7fd10faee5165014b7f

      encryption.innodb_encryption_filekeys 'cbc,innodb' w1 [ 26 fail ]
              Test ended at 2022-02-17 10:28:31
       
      CURRENT_TEST: encryption.innodb_encryption_filekeys
       
       
      Server [mysqld.1 - pid: 488142, winpid: 488142, exit: 256] failed during test run
      Server log from this test:
      ----------SERVER LOG START-----------
      2022-02-17 10:28:30 104 [Note] InnoDB: Creating #1 encryption thread id 140526397404736 total threads 4.
      2022-02-17 10:28:30 104 [Note] InnoDB: Creating #2 encryption thread id 140526389012032 total threads 4.
      2022-02-17 10:28:30 104 [Note] InnoDB: Creating #3 encryption thread id 140526405797440 total threads 4.
      2022-02-17 10:28:30 104 [Note] InnoDB: Creating #4 encryption thread id 140526414190144 total threads 4.
      mariadbd: /mariadb/10.8/storage/innobase/buf/buf0flu.cc:2538: void buf_flush_validate_low(): Assertion `om == 1 || !bpage || __builtin_expect(recv_sys.recovery_on, (0)) || om >= bpage->oldest_modification()' failed.
      

      The assertion reports that buf_pool.flush_list is not ordered by buf_page_t::oldest_modification(), like it must be.

      The impact of this bug is that log checkpoints and thus crash recovery and backup may work incorrectly.

      Attachments

        Issue Links

          Activity

            Another failure:

            CURRENT_TEST: encryption.innodb-checksum-algorithm
            mysqltest: At line 53: query 'ALTER TABLE tc DISCARD TABLESPACE' failed: <Unknown> (2013): Lost connection to server during query
            buf/buf0flu.cc:2585(buf_flush_validate_low())[0x55d918032422]
            buf/buf0flu.cc:112(buf_flush_validate_skip())[0x55d9180367fd]
            buf/buf0flu.cc:220(buf_pool_t::delete_from_flush_list(buf_page_t*, bool))[0x55d918036a4c]
            buf/buf0flu.cc:254(buf_flush_remove_pages(unsigned int))[0x55d917ef81d5]
            row/row0mysql.cc:2543(row_discard_tablespace_for_mysql(dict_table_t*, trx_t*))[0x55d917d74fc3]
            

            marko Marko Mäkelä added a comment - Another failure: CURRENT_TEST: encryption.innodb-checksum-algorithm mysqltest: At line 53: query 'ALTER TABLE tc DISCARD TABLESPACE' failed: <Unknown> (2013): Lost connection to server during query … buf/buf0flu.cc:2585(buf_flush_validate_low())[0x55d918032422] buf/buf0flu.cc:112(buf_flush_validate_skip())[0x55d9180367fd] buf/buf0flu.cc:220(buf_pool_t::delete_from_flush_list(buf_page_t*, bool))[0x55d918036a4c] buf/buf0flu.cc:254(buf_flush_remove_pages(unsigned int))[0x55d917ef81d5] row/row0mysql.cc:2543(row_discard_tablespace_for_mysql(dict_table_t*, trx_t*))[0x55d917d74fc3]

            I had forgotten that since MDEV-25113, the buf_pool.flush_list may contain clean blocks that are identified by buf_page_t:oldest_modification()==1. Those blocks must be removed or disregarded when buf_pool_t::insert_into_flush_list() determines the correct insert position.

            marko Marko Mäkelä added a comment - I had forgotten that since MDEV-25113 , the buf_pool.flush_list may contain clean blocks that are identified by buf_page_t:oldest_modification()==1 . Those blocks must be removed or disregarded when buf_pool_t::insert_into_flush_list() determines the correct insert position.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.