Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36082

Race condition between log_t::resize_start() and log_t::resize_abort()

Details

    Description

      mleich produced an rr replay trace where SET GLOBAL innodb_log_file_size leads to a debug assertion failure:

      10.11-MDEV-29445

      mariadbd: /data/Server/10.11-MDEV-29445/storage/innobase/log/log0log.cc:972: lsn_t log_t::write_buf() [with resizing_and_latch resizing = log_t::RESIZING; lsn_t = long unsigned int]: Assertion `resizing == RETAIN_LATCH || (resizing == RESIZING) == (resize_in_progress() > 1)' failed.
      

      We have log_sys.resize_lsn = 1, which means that log resizing is only about to start, and hence log_t::writer_update() had better not yet have assigned log_sys.writer=log_writer_resizing.

      In this trace, one log resizing is about to start, and another is being interrupted:

      SET GLOBAL innodb_log_file_size = 104857600 + 52428800 /* E_R Thread2 QNO 466 CON_ID 195 */ ;
      SET GLOBAL innodb_log_file_size = 104857600 /* E_R Thread2 QNO 454 CON_ID 192 */ ;
      

      Attachments

        Issue Links

          Activity

            marko This fix would help fixing the assert but the root of the problem seems to lie elsewhere. The problem is that the constraint that "I should abort my own resize" is violated. Unless fixed, it could cause other issues e.g. we could abort a future resize operation which would return success to end user.

            I have added more details in https://github.com/MariaDB/server/pull/3835 for your consideration.

            debarun Debarun Banerjee added a comment - marko This fix would help fixing the assert but the root of the problem seems to lie elsewhere. The problem is that the constraint that "I should abort my own resize" is violated. Unless fixed, it could cause other issues e.g. we could abort a future resize operation which would return success to end user. I have added more details in https://github.com/MariaDB/server/pull/3835 for your consideration.

            I ended up introducing a state variable log_sys.resize_initiator in order to avoid another potential glitch when one SET GLOBAL innodb_log_file_size fails to notice that the log resizing had completed, and then a subsequent SET GLOBAL innodb_log_file_size was started. By keeping track of the thread that initiated the log resizing, we can make sure that only that thread will wait for its own log resizing to be completed.

            marko Marko Mäkelä added a comment - I ended up introducing a state variable log_sys.resize_initiator in order to avoid another potential glitch when one SET GLOBAL innodb_log_file_size fails to notice that the log resizing had completed, and then a subsequent SET GLOBAL innodb_log_file_size was started. By keeping track of the thread that initiated the log resizing, we can make sure that only that thread will wait for its own log resizing to be completed.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.