Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23721

Assertion ‘node->table->is_temporary() || lock_table_has_locks(node->table)’ failed in row_undo_ins

Details

    Description

      ...
      2020-09-10  9:18:44 0 [Note] Recovering after a crash using mysql-bin
      2020-09-10  9:18:44 0 [Note] InnoDB: Loading buffer pool(s) from /dev/shm/vardir/1599750400/110/1/data/ib_buffer_pool
      2020-09-10  9:18:45 0 [Note] InnoDB: Buffer pool(s) load completed at 200910  9:18:45
      2020-09-10  9:18:47 0 [Note] InnoDB: To roll back: 4 transactions, 193 rows
      2020-09-10  9:18:48 0 [Note] Starting crash recovery...
      2020-09-10  9:18:48 0 [Note] Crash recovery finished.
      2020-09-10  9:18:52 0 [Note] InnoDB: Rolled back recovered transaction 102402
      mysqld: storage/innobase/row/row0uins.cc:532: dberr_t row_undo_ins(undo_node_t*, que_thr_t*): Assertion `node->table->is_temporary() || lock_table_has_locks(node->table)' failed.
      ...
      (rr) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x0000405d491ac859 in __GI_abort () at abort.c:79
      #2  0x0000405d491ac729 in __assert_fail_base (fmt=0x405d49342588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x557283fde500 "node->table->is_temporary() || lock_table_has_locks(node->table)", 
          file=0x557283fde920 "storage/innobase/row/row0uins.cc", line=532, function=<optimized out>) at assert.c:92
      #3  0x0000405d491bdf36 in __GI___assert_fail (assertion=assertion@entry=0x557283fde500 "node->table->is_temporary() || lock_table_has_locks(node->table)", 
          file=file@entry=0x557283fde920 "storage/innobase/row/row0uins.cc", line=line@entry=532, function=function@entry=0x557283fde580 "dberr_t row_undo_ins(undo_node_t*, que_thr_t*)") at assert.c:101
      #4  0x00005572832d80ea in row_undo_ins (node=node@entry=0x61b000051020, thr=thr@entry=0x6170000311e8) at storage/innobase/include/dict0mem.h:1795
      #5  0x0000557282cf2b21 in row_undo (node=node@entry=0x61b000051020, thr=thr@entry=0x6170000311e8) at storage/innobase/row/row0undo.cc:433
      #6  0x0000557282cf7602 in row_undo_step (thr=thr@entry=0x6170000311e8) at /storage/innobase/row/row0undo.cc:494
      #7  0x0000557282b2b96b in que_thr_step (thr=thr@entry=0x6170000311e8) at storage/innobase/que/que0que.cc:945
      #8  0x0000557282b2c8dc in que_run_threads_low (thr=thr@entry=0x6170000311e8) at storage/innobase/que/que0que.cc:1009
      #9  0x0000557282b2d618 in que_run_threads (thr=0x6170000311e8) at storage/innobase/que/que0que.cc:1049
      #10 0x0000557282e1a38d in trx_rollback_active (trx=trx@entry=0x3002289a20b8) at storage/innobase/trx/trx0roll.cc:628
      #11 0x0000557282e223d6 in trx_rollback_recovered (all=all@entry=true) at storage/innobase/trx/trx0roll.cc:782
      #12 0x0000557282e234e1 in trx_rollback_all_recovered () at storage/innobase/trx/trx0roll.cc:836
      #13 0x00007c755c863609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #14 0x0000405d492a9103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      (rr)
       
      Workflow
      ========
      1. Start the server, load some initial data
      2. Several concurrent sessions run a DDL/DML mix
      3. At some point of time (2. is ongoing) the server process gets killed with ABRT
      4. Attempt to restart that server
       
      MariaDB
      =======
      origin/bb-10.5-MDEV-23399 29e8b54c1289f212eb8e820fab584b0769c6c69e 2020-09-10T14:28:45+03:00
      The server claims to be a 10.5.6-MariaDB-debug-log.
       
      RQG
      ====
      git clone https://github.com/mleich1/rqg --branch experimental RQG
      origin/experimental 26f374c2dbcebdbe72321f3728c564f1c327bba6 2020-09-02T13:10:22+02:00
       
      perl rqg.pl \
      --grammar=conf/mariadb/innodb_compression_encryption.yy \
      --gendata=conf/mariadb/innodb_compression_encryption.zz \
      --mysqld=--plugin-load-add=file_key_management.so \
      --mysqld=--loose-file-key-management-filename=/home/mleich/RQG/conf/mariadb/encryption_keys.txt \
      --mysqld=--innodb-encrypt-log \
      --mysqld=--innodb-encrypt-tables \
      --reporters=CrashRecovery1 \
      --redefine=conf/mariadb/redefine_innodb_undo.yy \
      --mysqld=--innodb-immediate-scrub-data-uncompressed=1 \
      --mysqld=--innodb_use_native_aio=1 \
      --mysqld=--innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log-bin \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=120 \
      --duration=300 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --mysqld=--loose-innodb-sync-debug \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--loose-max-statement-time=30 \
      --threads=33 \
      --mysqld=--innodb_page_size=4K \
      --mysqld=--innodb-buffer-pool-size=8M \
      --duration=300 \
      --no_mask \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<local settings> \
      --script_debug=_nix_ \
      --rr=Server \
      --rr_options=--chaos
       
      Innodb compression and encryption are involved but its unknown if
      they are required for the current defect.
      The likelihood to replay the current problem is rather low.
      
      

      Attachments

        Issue Links

          Activity

            I performed an initial analysis of the trace. The problem appears to be that the name of the table does not match the tablespace until the very last call of fil_space_for_table_exists_in_mem(). That is why early calls of fil_ibd_open() returned NULL or !table->is_readable() held (due to !table->space) in trx_resurrect_table_locks(). However, the final call to fil_table_accessible() in row_undo_ins_parse_undo_rec() would hold, and upon rolling back the changes, we would notice that no table lock was resurrected.

            It seems to me that MDEV-23484 will require a follow-up fix to make it compatible with the rollback of RENAME operations (which was implemented in MDEV-14717).

            Because trx_resurrect_table_locks() is evicting the unreadable table, and because the table will be loaded with the correct name in dict_table_open_on_id() by row_undo_ins_parse_undo_rec(), it seems that the SYS_TABLES.NAME must have been corrected by a rollback of a dictionary transaction.

            Possibly the fix should be that trx_resurrect_table_locks() should resurrect table locks even if !table->space.

            It is unclear yet whether 10.2 is affected by this. MDEV-12266 changed some related code in 10.3.

            marko Marko Mäkelä added a comment - I performed an initial analysis of the trace. The problem appears to be that the name of the table does not match the tablespace until the very last call of fil_space_for_table_exists_in_mem() . That is why early calls of fil_ibd_open() returned NULL or !table->is_readable() held (due to !table->space ) in trx_resurrect_table_locks() . However, the final call to fil_table_accessible() in row_undo_ins_parse_undo_rec() would hold, and upon rolling back the changes, we would notice that no table lock was resurrected. It seems to me that MDEV-23484 will require a follow-up fix to make it compatible with the rollback of RENAME operations (which was implemented in MDEV-14717 ). Because trx_resurrect_table_locks() is evicting the unreadable table, and because the table will be loaded with the correct name in dict_table_open_on_id() by row_undo_ins_parse_undo_rec() , it seems that the SYS_TABLES.NAME must have been corrected by a rollback of a dictionary transaction. Possibly the fix should be that trx_resurrect_table_locks() should resurrect table locks even if !table->space . It is unclear yet whether 10.2 is affected by this. MDEV-12266 changed some related code in 10.3.

            For fixing the bug, I will need another trace, because we lost the original one.

            marko Marko Mäkelä added a comment - For fixing the bug, I will need another trace, because we lost the original one.

            I think that this might already have been fixed in 10.6. If not, MDEV-27234 could share a root cause with this.

            marko Marko Mäkelä added a comment - I think that this might already have been fixed in 10.6. If not, MDEV-27234 could share a root cause with this.

            mleich, is this still repeatable?

            marko Marko Mäkelä added a comment - mleich , is this still repeatable?

            The problem was no more observed since ~ Spring 2021.

            mleich Matthias Leich added a comment - The problem was no more observed since ~ Spring 2021.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.