Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22778

Slow InnoDB shutdown on large instance

Details

    Description

      Shutting down al large windows instance (innodb_pool_buffer_size=384GB, > 500K tables) takes much longer with 10.4 than it did with 10.3.

      Shutdown takes around 10 minutes even with innodb_fast_shutdown=1 or =3.

      innodb_defragment was originally enabled, but even after turning that off shutdown still took several minutes, even though less than with defragmentation enabled.

      Nothing special in the error log besides [Note] InnoDB: Waiting for master threadto exit once per minute.

      Attachments

        1. image-2020-07-14-19-18-54-693.png
          image-2020-07-14-19-18-54-693.png
          851 kB
        2. image-2020-07-14-19-18-55-706.png
          image-2020-07-14-19-18-55-706.png
          851 kB
        3. mdev-11778-workaround.patch
          5 kB
        4. mdev-22778.sql.gz
          9.38 MB
        5. screenshot-1.png
          screenshot-1.png
          535 kB
        6. screenshot-2.png
          screenshot-2.png
          851 kB

        Issue Links

          Activity

            hholzgra and wlad, thank you for confirming this as a 10.4 regression.

            I think that we could just remove the zip_pad altogether. The code was originally added to improve ROW_FORMAT=COMPRESSED write performance, back when Facebook was an active user, before they invested effort in MyRocks. I have been thinking to remove write support for ROW_FORMAT=COMPRESSED in MariaDB Server 10.6 (MDEV-22367), to simplify the buffer pool code further. I think that it should be OK to lose some write performance on that obscure format.

            It looks like the dict_table_t::autoinc_mutex could be eliminated by changing the data type of dict_table_t::autoinc to Atomic_relaxed<uint64_t>. The trickiest part to replace seems to be ha_innobase::innobase_lock_autoinc().

            marko Marko Mäkelä added a comment - hholzgra and wlad , thank you for confirming this as a 10.4 regression. I think that we could just remove the zip_pad altogether. The code was originally added to improve ROW_FORMAT=COMPRESSED write performance, back when Facebook was an active user, before they invested effort in MyRocks. I have been thinking to remove write support for ROW_FORMAT=COMPRESSED in MariaDB Server 10.6 ( MDEV-22367 ), to simplify the buffer pool code further. I think that it should be OK to lose some write performance on that obscure format. It looks like the dict_table_t::autoinc_mutex could be eliminated by changing the data type of dict_table_t::autoinc to Atomic_relaxed<uint64_t> . The trickiest part to replace seems to be ha_innobase::innobase_lock_autoinc() .

            I will try to see if I can eliminate the two mutexes.

            marko Marko Mäkelä added a comment - I will try to see if I can eliminate the two mutexes.

            @Marko, maybe you can eliminate the "mutex deregister" stuff on shutdown, for the latch counters and what not, and instead of removing entries from vectors one-by-one, clear the vectors at once , or just let C++ call the static destructors.

            wlad Vladislav Vaintroub added a comment - @Marko, maybe you can eliminate the "mutex deregister" stuff on shutdown, for the latch counters and what not, and instead of removing entries from vectors one-by-one, clear the vectors at once , or just let C++ call the static destructors.

            Removing zip_pad would also disable two configuration parameters: innodb_compression_failure_threshold_pct and innodb_compression_pad_pct_max. It feels wrong to do that in a GA release.

            It does not seem easy to guarantee the correctness of ha_innobase::innobase_lock_autoinc() and its callers if we remove dict_table_t::autoinc_mutex.

            I think that the least intrusive change would be to replace the two problematic mutexes with std::mutex or something that does not need explicit or time-consuming destruction. Their typical hold time is very short, and the InnoDB instrumentation for deadlocks should not be needed.

            marko Marko Mäkelä added a comment - Removing zip_pad would also disable two configuration parameters: innodb_compression_failure_threshold_pct and innodb_compression_pad_pct_max . It feels wrong to do that in a GA release. It does not seem easy to guarantee the correctness of ha_innobase::innobase_lock_autoinc() and its callers if we remove dict_table_t::autoinc_mutex . I think that the least intrusive change would be to replace the two problematic mutexes with std::mutex or something that does not need explicit or time-consuming destruction. Their typical hold time is very short, and the InnoDB instrumentation for deadlocks should not be needed.
            marko Marko Mäkelä added a comment - - edited

            The RelWithDebInfo build completed the following test case in 16 seconds (not counting bootstrap) when using the MyISAM engine on /dev/shm:

            --source include/have_innodb.inc
            DELIMITER //;
            FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat("create table t",i), " (i int) ENGINE=MyISAM;"); end for; //
            DELIMITER ;//
            --source include/restart_mysqld.inc
            

            With ENGINE=InnoDB and my fix of using std::mutex, it completed in 62 seconds.
            With ENGINE=InnoDB and without my fix, it completed in 69 seconds.
            Creating the tables takes the majority of the time (59 seconds).
            The shutdown and restart without the fix would take about 9 seconds, and with the fix, about 3 seconds.

            I think that implementing this fix is sufficient for now. We might look at speeding up innodb_fast_shutdown=2 and innodb_fast_shutdown=3 in non-debug builds later if it is truly needed.

            marko Marko Mäkelä added a comment - - edited The RelWithDebInfo build completed the following test case in 16 seconds (not counting bootstrap) when using the MyISAM engine on /dev/shm : --source include/have_innodb.inc DELIMITER //; FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat( "create table t" ,i), " (i int) ENGINE=MyISAM;" ); end for ; // DELIMITER ;// --source include/restart_mysqld.inc With ENGINE=InnoDB and my fix of using std::mutex , it completed in 62 seconds. With ENGINE=InnoDB and without my fix, it completed in 69 seconds. Creating the tables takes the majority of the time (59 seconds). The shutdown and restart without the fix would take about 9 seconds, and with the fix, about 3 seconds. I think that implementing this fix is sufficient for now. We might look at speeding up innodb_fast_shutdown=2 and innodb_fast_shutdown=3 in non-debug builds later if it is truly needed.

            People

              marko Marko Mäkelä
              hholzgra Hartmut Holzgraefe
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.