Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26755

innodb.undo_truncate: ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.

Details

    Description

      The test innodb.undo_truncate, which in MDEV-26672 was updated to issue a slow shutdown, is causing assertion failures in 10.3 and 10.4. 10.5 and later are unaffected, likely thanks to MDEV-16264 removing thd_destructor_proxy.

      10.2 is likely affected as well, but the test innodb.undo_truncate is not issuing a slow shutdown (innodb_fast_shutdown=0) there. Here is a sample failure:

      10.3 333d6c30f87a7862a2d6ca379c49e2ea46451ebc

      innodb.undo_truncate '2,32k,innodb'      w3 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2021-09-29 17:16:03
      line
      mysqld: /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_list.h:642: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.
      Attempting backtrace. You can use the following information to find out
      mysqld: /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_list.h:642: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.
      ^ Found warnings in /mnt/buildbot/build/mariadb-10.3.32/mysql-test/var/3/log/mysqld.1.err
      #4  0xb703dd8b in __assert_fail () from /lib/i386-linux-gnu/libc.so.6
      #5  0x803fb73f in ilink::assert_linked (this=0xb23021a0) at /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_list.h:642
      #6  0x803fcd37 in unlink_not_visible_thd (thd=0xb23021a0) at /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_class.h:4906
      #7  0x804b8357 in destroy_thd (thd=0xb23021a0) at /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_class.cc:4792
      #8  0x80b635ab in thd_destructor_proxy () at /home/buildbot/buildbot/build/mariadb-10.3.32/storage/innobase/handler/ha_innodb.cc:343
      

      Attachments

        Issue Links

          Activity

            wlad Vladislav Vaintroub added a comment - - edited

            I'm not sure how can I help in there. This happens since something was modified recently in Innodb. It does not happen with anything else. IF there is a founded suspicion that something (what) does not work, I wholehearteadly welcome this analysis

            it is not the first time Innodb_fast_shutdown is being used in tests

            C:\work\10.3\mysql-test>git grep innodb_fast_shutdown | findstr .test | wc -l
            94
            

            wlad Vladislav Vaintroub added a comment - - edited I'm not sure how can I help in there. This happens since something was modified recently in Innodb. It does not happen with anything else. IF there is a founded suspicion that something (what) does not work, I wholehearteadly welcome this analysis it is not the first time Innodb_fast_shutdown is being used in tests C:\work\10.3\mysql-test>git grep innodb_fast_shutdown | findstr .test | wc -l 94

            Various tests have been failing with the same assertion failure for a long time (MDEV-13644), although I don't know if it's related to slow shutdown and/or otherwise is the same problem.

            elenst Elena Stepanova added a comment - Various tests have been failing with the same assertion failure for a long time ( MDEV-13644 ), although I don't know if it's related to slow shutdown and/or otherwise is the same problem.

            While the tests that have been named in MDEV-13644 do not involve shutdown with innodb_fast_shutdown=0, I believe that these failures could share a root cause with that. I think that the correct assignee of this issue is the author of thd_destructor_proxy.

            marko Marko Mäkelä added a comment - While the tests that have been named in MDEV-13644 do not involve shutdown with innodb_fast_shutdown=0 , I believe that these failures could share a root cause with that. I think that the correct assignee of this issue is the author of thd_destructor_proxy .

            This is what's happening. The server starts shutting down, it sends a signal to all threads, including thd_destructor_proxy, the latter starts srv_purge_shutdown(). But InnoDB is too busy purging, all workers purge something and don't react immediately. The server does this:

              /*
                Give threads time to die.
             
                In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little
                as 2 seconds, depending on thread scheduling.
             
                From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is
                that on a server with heavy I/O load, it is quite possible for eg. an
                fsync() of the binlog or whatever to cause something like LOCK_log to be
                held for more than 2 seconds. We do not want to force kill threads in
                such cases, if it can be avoided. Note that normally, the wait will be
                much smaller than even 2 seconds, this is only a safety fallback against
                stuck threads so server shutdown is not held up forever.
              */
              DBUG_PRINT("info", ("thread_count: %d", thread_count));
             
              for (int i= 0; *(volatile int32*) &thread_count && i < 1000; i++)
                my_sleep(20000);
            

            But in this test it's not enough and the server starts unlinking threads one by one (with threads.get()) and forcibly close their open connections. InnoDB threads have no connection to close, but they're unlinked too, and that triggers an assert when they finally detect the shutdown request and try to unlink, asserting first that they're still linked.

            serg Sergei Golubchik added a comment - This is what's happening. The server starts shutting down, it sends a signal to all threads, including thd_destructor_proxy , the latter starts srv_purge_shutdown() . But InnoDB is too busy purging, all workers purge something and don't react immediately. The server does this: /* Give threads time to die.   In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little as 2 seconds, depending on thread scheduling.   From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is that on a server with heavy I/O load, it is quite possible for eg. an fsync() of the binlog or whatever to cause something like LOCK_log to be held for more than 2 seconds. We do not want to force kill threads in such cases, if it can be avoided. Note that normally, the wait will be much smaller than even 2 seconds, this is only a safety fallback against stuck threads so server shutdown is not held up forever. */ DBUG_PRINT( "info" , ( "thread_count: %d" , thread_count));   for ( int i= 0; *( volatile int32*) &thread_count && i < 1000; i++) my_sleep(20000); But in this test it's not enough and the server starts unlinking threads one by one (with threads.get() ) and forcibly close their open connections. InnoDB threads have no connection to close, but they're unlinked too, and that triggers an assert when they finally detect the shutdown request and try to unlink, asserting first that they're still linked.

            People

              serg Sergei Golubchik
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.