[MDEV-26755] innodb.undo_truncate: ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed. Created: 2021-10-04  Updated: 2022-07-27  Resolved: 2021-11-26

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.2.42, 10.3.33, 10.4.23

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: not-10.5

Issue Links:
Duplicate
is duplicated by MDEV-13644 encryption.innochecksum, innodb.innod... Closed
Relates
relates to MDEV-26672 innodb_undo_log_truncate may cause th... Closed

 Description   

The test innodb.undo_truncate, which in MDEV-26672 was updated to issue a slow shutdown, is causing assertion failures in 10.3 and 10.4. 10.5 and later are unaffected, likely thanks to MDEV-16264 removing thd_destructor_proxy.

10.2 is likely affected as well, but the test innodb.undo_truncate is not issuing a slow shutdown (innodb_fast_shutdown=0) there. Here is a sample failure:

10.3 333d6c30f87a7862a2d6ca379c49e2ea46451ebc

innodb.undo_truncate '2,32k,innodb'      w3 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2021-09-29 17:16:03
line
mysqld: /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_list.h:642: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.
Attempting backtrace. You can use the following information to find out
mysqld: /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_list.h:642: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.
^ Found warnings in /mnt/buildbot/build/mariadb-10.3.32/mysql-test/var/3/log/mysqld.1.err
#4  0xb703dd8b in __assert_fail () from /lib/i386-linux-gnu/libc.so.6
#5  0x803fb73f in ilink::assert_linked (this=0xb23021a0) at /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_list.h:642
#6  0x803fcd37 in unlink_not_visible_thd (thd=0xb23021a0) at /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_class.h:4906
#7  0x804b8357 in destroy_thd (thd=0xb23021a0) at /home/buildbot/buildbot/build/mariadb-10.3.32/sql/sql_class.cc:4792
#8  0x80b635ab in thd_destructor_proxy () at /home/buildbot/buildbot/build/mariadb-10.3.32/storage/innobase/handler/ha_innodb.cc:343



 Comments   
Comment by Vladislav Vaintroub [ 2021-10-04 ]

I'm not sure how can I help in there. This happens since something was modified recently in Innodb. It does not happen with anything else. IF there is a founded suspicion that something (what) does not work, I wholehearteadly welcome this analysis

it is not the first time Innodb_fast_shutdown is being used in tests

C:\work\10.3\mysql-test>git grep innodb_fast_shutdown | findstr .test | wc -l
94

Comment by Elena Stepanova [ 2021-10-04 ]

Various tests have been failing with the same assertion failure for a long time (MDEV-13644), although I don't know if it's related to slow shutdown and/or otherwise is the same problem.

Comment by Marko Mäkelä [ 2021-10-12 ]

While the tests that have been named in MDEV-13644 do not involve shutdown with innodb_fast_shutdown=0, I believe that these failures could share a root cause with that. I think that the correct assignee of this issue is the author of thd_destructor_proxy.

Comment by Sergei Golubchik [ 2021-11-03 ]

This is what's happening. The server starts shutting down, it sends a signal to all threads, including thd_destructor_proxy, the latter starts srv_purge_shutdown(). But InnoDB is too busy purging, all workers purge something and don't react immediately. The server does this:

  /*
    Give threads time to die.
 
    In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little
    as 2 seconds, depending on thread scheduling.
 
    From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is
    that on a server with heavy I/O load, it is quite possible for eg. an
    fsync() of the binlog or whatever to cause something like LOCK_log to be
    held for more than 2 seconds. We do not want to force kill threads in
    such cases, if it can be avoided. Note that normally, the wait will be
    much smaller than even 2 seconds, this is only a safety fallback against
    stuck threads so server shutdown is not held up forever.
  */
  DBUG_PRINT("info", ("thread_count: %d", thread_count));
 
  for (int i= 0; *(volatile int32*) &thread_count && i < 1000; i++)
    my_sleep(20000);

But in this test it's not enough and the server starts unlinking threads one by one (with threads.get()) and forcibly close their open connections. InnoDB threads have no connection to close, but they're unlinked too, and that triggers an assert when they finally detect the shutdown request and try to unlink, asserting first that they're still linked.

Generated at Thu Feb 08 09:47:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.