[MDEV-24569] Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed in log_phys_t::apply Created: 2021-01-11  Updated: 2023-06-01  Resolved: 2021-01-14

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.0, 10.5
Fix Version/s: 10.5.9, 10.6.0

Type: Bug Priority: Blocker
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, regression-10.5, rr-profile

Attachments: File CR-20.cc    
Issue Links:
Problem/Incident
causes MDEV-24864 Fatal error in buf_page_get_low() / f... Closed
causes MDEV-24997 Assertion mtr->is_named_space(page_id... Closed
causes MDEV-25018 [FATAL] InnoDB: Unable to read page (... Closed
Relates
relates to MDEV-24589 DROP TABLE is not crash-safe Closed
relates to MDEV-15528 Avoid writing freed InnoDB pages Closed
relates to MDEV-18518 Implement atomic multi-table (or mult... Closed
relates to MDEV-31088 Server freeze due to innodb_change_bu... Closed

 Description   

origin/10.6 92abdcca5a5324f0727112ab2417d10c7a8d5094 2021-01-07T09:08:09+02:00
Workflow:
1. Start the server
2. Run a DDL/DML mix by 9 sessions
3. During 2. is ongoing kill the server
4. The attempt to restart the server fails with
mysqld:storage/innobase/log/log0recv.cc:274: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007d171d43c859 in __GI_abort () at abort.c:79
#2  0x00007d171d43c729 in __assert_fail_base (fmt=0x7d171d5d2588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x564bea837fc0 "mach_read_from_4(frame + 4U) == block.page.id().page_no()", 
    file=0x564bea837960 "/Server/10.6E/storage/innobase/log/log0recv.cc", line=274, function=<optimized out>) at assert.c:92
#3  0x00007d171d44df36 in __GI___assert_fail (assertion=0x564bea837fc0 "mach_read_from_4(frame + 4U) == block.page.id().page_no()", file=0x564bea837960 "/Server/10.6E/storage/innobase/log/log0recv.cc", line=274, 
    function=0x564bea837d40 "log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const") at assert.c:101
#4  0x0000564be906714e in log_phys_t::apply (this=0x6400003b2740, block=..., last_offset=@0x608001af92e4: 0) at /Server/10.6E/storage/innobase/log/log0recv.cc:274
#5  0x0000564be905473c in recv_recover_page (block=0x6400000e8360, mtr=..., Traceback (most recent call last):
  File "/lib/x86_64-linux-gnu/../../share/gcc-10/python/libstdcxx/v6/printers.py", line 1522, in __call__
    return self.lookup[basename].invoke(val)
  File "/lib/x86_64-linux-gnu/../../share/gcc-10/python/libstdcxx/v6/printers.py", line 1459, in invoke
    return self.function(self.name, value)
  File "/lib/x86_64-linux-gnu/../../share/gcc-10/python/libstdcxx/v6/printers.py", line 649, in __init__
    self.link_type = nodetype.pointer()
AttributeError: 'NoneType' object has no attribute 'pointer'
p=
..., space=0x613000000c98, init=0x6070000117f0) at /Server/10.6E/storage/innobase/log/log0recv.cc:2316
#6  0x0000564be906f5c6 in recv_sys_t::recover_low (this=0x564bebab2da0 <recv_sys>, page_id=..., Traceback (most recent call last):
  File "/lib/x86_64-linux-gnu/../../share/gcc-10/python/libstdcxx/v6/printers.py", line 1522, in __call__
    return self.lookup[basename].invoke(val)
  File "/lib/x86_64-linux-gnu/../../share/gcc-10/python/libstdcxx/v6/printers.py", line 1459, in invoke
    return self.function(self.name, value)
  File "/lib/x86_64-linux-gnu/../../share/gcc-10/python/libstdcxx/v6/printers.py", line 649, in __init__
    self.link_type = nodetype.pointer()
AttributeError: 'NoneType' object has no attribute 'pointer'
p=
..., mtr=..., b=0x6400000e8360) at /Server/10.6E/storage/innobase/log/log0recv.cc:2590
#7  0x0000564be905753b in recv_sys_t::apply (this=0x564bebab2da0 <recv_sys>, last_batch=true) at /Server/10.6E/storage/innobase/log/log0recv.cc:2705
#8  0x0000564be92efde0 in srv_start (create_new_db=false) at /Server/10.6E/storage/innobase/srv/srv0start.cc:1470
#9  0x0000564be8e6c164 in innodb_init (p=0x615000002898) at /Server/10.6E/storage/innobase/handler/ha_innodb.cc:3689
#10 0x0000564be8350be1 in ha_initialize_handlerton (plugin=0x621000033010) at /Server/10.6E/sql/handler.cc:645
#11 0x0000564be7b895b4 in plugin_initialize (tmp_root=0x7ffe97cc90a0, plugin=0x621000033010, argc=0x564bebff7a00 <remaining_argc>, argv=0x615000000e08, options_only=false) at /Server/10.6E/sql/sql_plugin.cc:1459
#12 0x0000564be7b8b542 in plugin_init (argc=0x564bebff7a00 <remaining_argc>, argv=0x615000000e08, flags=0) at /Server/10.6E/sql/sql_plugin.cc:1751
#13 0x0000564be784466b in init_server_components () at /Server/10.6E/sql/mysqld.cc:4950
#14 0x0000564be784665e in mysqld_main (argc=28, argv=0x615000000e08) at /Server/10.6E/sql/mysqld.cc:5539
#15 0x0000564be783014d in main (argc=27, argv=0x7ffe97ccb7d8) at /Server/10.6E/sql/main.cc:25
(rr)
 
pluto:/data/Results/1610366087/CR-20/dev/shm/vardir/1610366087/50/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0  # Fate of the server till SIGKILL
_RR_TRACE_DIR="." rr replay --mark-stdio                      # The failing restart
 
 
 
RQG
====
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \                   
--no-mask \
--seed=1610366105 \
--duration=300 \
--engine=InnoDB \
--queries=100000000 \
--mysqld=--loose_innodb_use_native_aio=1 \
--reporters=Backtrace,ErrorLog,Deadlock1,CrashRecovery1 \
--mysqld=--innodb_undo_tablespaces=3 \
--mysqld=--innodb_immediate_scrub_data_uncompressed=1 \
--mysqld=--innodb_file_per_table=0 \
--restart_timeout=600 \
--max_gd_duration=1200 \
--mysqld=--innodb_page_size=8K \
--mysqld=--innodb-buffer-pool-size=8M \
--mysqld=--log_output=none \
--sqltrace=MarkErrors \
--grammar=conf/mariadb/innodb_compression_encryption.yy \
--gendata=conf/mariadb/innodb_compression_encryption.zz \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--loose-file-key-management-filename=$RQG/conf/mariadb/encryption_keys.txt \
--threads=9 \
--duration=300 \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--basedir2=<local settings> \
--script_debug=_nix_ \
--rr=Extended \
--rr_options=--chaos
 
You will need several attempts for replaying the bug once.



 Comments   
Comment by Matthias Leich [ 2021-01-11 ]

Elena observed the same issue on 10.5-fae87e0c.
Therefore I added 10.5 to the list of to be fixed releases.

Comment by Matthias Leich [ 2021-01-11 ]

Alternative way of replaying the problem
cd <to level directory of RQG>
./RR_BATCH CR-20.cc <path to mariadb binaries like /Server_bin/10.6E_asan>
In some second window run
    grep CR-20 last_batch_workdir/result.txt | grep .v li
as soon as you see a line like
2020-12-16T13:14:23 |    126 |    104 | replay           | 000125.log |     104 |     592 | STATUS_RECOVERY_FAILURE--CR-20
you have made a replay.

Comment by Marko Mäkelä [ 2021-01-11 ]

I did some initial debugging of this:

watch -l block.page_id_
reverse-continue
break log0recv.cc:201 thread 1
continue
continue
display/x *recs@*(uint16_t*)(this+1)
continue
continue

With this, I can dump the records for the page. We see that the system tablespace page 606 is fully initialized by the mini-transaction. The first record is a large WRITE plus something else, comprisiong 35,144 bytes in total. The innodb_page_size is 8,192 bytes. The contents of that record (possibly records from multiple mini-transactions) does not seem to matter, because the subsequent records are as follows:

Thread 1 hit Breakpoint 2, log_phys_t::apply (this=0x640000a0d3b0, block=..., 
    last_offset=@0x608001af92e4: 1)
    at /Server/10.6E/storage/innobase/log/log0recv.cc:211
211	      const byte b= *l++;
1: /x *recs@*(uint16_t*)(this+1) = {0x3, 0x0, 0x81, 0xde}
(rr) continue
Continuing.
 
Thread 1 hit Breakpoint 2, log_phys_t::apply (this=0x640000a0d3b0, block=..., 
    last_offset=@0x608001af92e4: 1)
    at /Server/10.6E/storage/innobase/log/log0recv.cc:211
211	      const byte b= *l++;
1: /x *recs@*(uint16_t*)(this+1) = {0x3, 0x0, 0x81, 0xde}
(rr) continue
Continuing.
 
Thread 1 hit Breakpoint 2, log_phys_t::apply (this=0x6400003b2740, block=..., 
    last_offset=@0x608001af92e4: 1)
    at /Server/10.6E/storage/innobase/log/log0recv.cc:211
211	      const byte b= *l++;
1: /x *recs@*(uint16_t*)(this+1) = {0x36, 0x0, 0x81, 0xde, 0x3e, 0x2, 0x6c}
(rr) continue
Continuing.
mysqld: /Server/10.6E/storage/innobase/log/log0recv.cc:274: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.

So, we got 3 FREE_PAGE records for the page (606 is encoded as 0x81de), followed by a WRITE.
Why did recovery not discard earlier records for the page when seeing each FREE_PAGE? It should do that also when seeing INIT_PAGE, to save memory or to reduce the log apply time. It would be nice to fix this.

Before the server was killed, it apparently wrote a WRITE record, writing the bytes 0x2, 0x6c to offset 0x3e on the freed page 606. This is the root cause of the bug. We are probably missing some write of a record. Before the server was killed, the page was created here:

ssh pluto
_RR_TRACE_DIR=/data/Results/1610366087/CR-20/dev/shm/vardir/1610366087/50/1/rr rr replay mysqld-0

Thread 6 hit Breakpoint 2, buf_page_create (space=0x613000000c98, offset=606, 
    zip_size=0, mtr=0x3c8026a236c0, free_block=0x35c362ac1560)
    at /Server/10.6E/storage/innobase/buf/buf0buf.cc:3516
3516	  ut_ad(mtr->is_active());
1: page_id = {m_id = 606}
(rr) when
Current event: 668358
(rr) fin
Run till exit from #0  buf_page_create (space=0x613000000c98, offset=606, 
    zip_size=0, mtr=0x3c8026a236c0, free_block=0x35c362ac1560)
    at /Server/10.6E/storage/innobase/buf/buf0buf.cc:3516
0x00005608a4d4af09 in fsp_page_create (space=0x613000000c98, offset=606, 
    mtr=0x3c8026a236c0) at /Server/10.6E/storage/innobase/fsp/fsp0fsp.cc:1039
1039	  buf_block_t *block= buf_page_create(space, static_cast<uint32_t>(offset),
Value returned is $1 = (buf_block_t *) 0x35c362ac1560
(rr) disable 2
(rr) watch -l $1.page.id_.m_id
Hardware watchpoint 3: -location $1.page.id_.m_id
(rr) c
[Switching to Thread 2120292.2140161]
 
Thread 15 hit Hardware watchpoint 3: -location $1.page.id_.m_id
 
Old value = 606
New value = 18446744073709551615
buf_page_t::set_corrupt_id (this=0x35c362ac1560)
    at /Server/10.6E/storage/innobase/include/buf0buf.h:2104
2104	}
(rr) bt
#0  buf_page_t::set_corrupt_id (this=0x35c362ac1560)
    at /Server/10.6E/storage/innobase/include/buf0buf.h:2104
#1  0x00005608a4c4f5da in buf_page_t::free_file_page (this=0x35c362ac1560)
    at /Server/10.6E/storage/innobase/include/buf0buf.h:890
#2  0x00005608a4c44379 in buf_LRU_block_free_hashed_page (block=0x35c362ac1560)
    at /Server/10.6E/storage/innobase/buf/buf0lru.cc:130
#3  0x00005608a4c4b0ca in buf_LRU_free_page (bpage=0x35c362ac1560, zip=true)
    at /Server/10.6E/storage/innobase/buf/buf0lru.cc:964
#4  0x00005608a4c317da in buf_page_write_complete (request=...)
(rr) when
Current event: 2476033
(rr) ena 2
(rr) disa 3
(rr) c
Continuing.

We would like to find the last time the page was freed and subsequently the record was written. It might be easier to let the execution continue to the end, then set the breakpoint inside buf_page_create(), and set a watchpoint on the returned block frame, to find the problematic write to the page. It might also be useful to set a watchpoint on block->lock.lock.lock._M_i (the lock word) to find the mtr_t::commit() that wrote the WRITE record. I did not do that now, because the trace leading to the server kill is pretty long (ending at event 6851936).

Comment by Matthias Leich [ 2021-01-13 ]

Preliminary results of RQG testing on origin/bb-10.5-MDEV-24569 2181fa74fdda0bba8a2cc92b555e39d0b9162877 2021-01-13T21:54:30+05:30.
1. Using the replay test from the first comment
     I give up after ~ 1000 attempts without replay of the problem.
     On  10.6 2021-01-07   I needed 90 RQG tests for hitting the problem 4 times.
2. I hit a 
     mysqld: /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc:1811: dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t): Assertion `index->id == btr_page_get_index_id(page)' failed.
     (rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000032f55d44e859 in __GI_abort () at abort.c:79
#2  0x000032f55d44e729 in __assert_fail_base (fmt=0x32f55d5e4588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5558f9fcfd60 "index->id == btr_page_get_index_id(page)", 
    file=0x5558f9fcd660 "/Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc", line=1811, function=<optimized out>) at assert.c:92
#3  0x000032f55d45ff36 in __GI___assert_fail (assertion=0x5558f9fcfd60 "index->id == btr_page_get_index_id(page)", file=0x5558f9fcd660 "/Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc", line=1811, 
    function=0x5558f9fcef00 "dberr_t btr_cur_search_to_nth_level_func(dict_index_t*, ulint, const dtuple_t*, page_cur_mode_t, ulint, btr_cur_t*, rw_lock_t*, const char*, unsigned int, mtr_t*, ib_uint64_t)") at assert.c:101
#4  0x00005558f890c882 in btr_cur_search_to_nth_level_func (index=0x61700008afa0, level=0, tuple=0x625001a1fdc0, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x61f000000540, ahi_latch=0x0, 
    file=0x5558f9eb08a0 "/Server/bb-10.5-MDEV-24569/storage/innobase/row/row0row.cc", line=1213, mtr=0x233825740060, autoinc=0) at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc:1811
#5  0x00005558f873fbdf in btr_pcur_open_low (index=0x61700008afa0, level=0, tuple=0x625001a1fdc0, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x61f000000540, file=0x5558f9eb08a0 "/Server/bb-10.5-MDEV-24569/storage/innobase/row/row0row.cc", line=1213, autoinc=0, 
    mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/include/btr0pcur.ic:441
#6  0x00005558f87461f6 in row_search_on_row_ref (pcur=0x61f000000540, mode=2, table=0x618000040920, ref=0x625001a1fdc0, mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0row.cc:1213
#7  0x00005558f872edfd in row_purge_reposition_pcur (mode=2, node=0x61f0000004a0, mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:78
#8  0x00005558f8731a05 in row_purge_reset_trx_id (node=0x61f0000004a0, mtr=0x233825740060) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:658
#9  0x00005558f87351ec in row_purge_record_func (node=0x61f0000004a0, undo_rec=0x62500194cf60 "", thr=0x61f0000003f0, updated_extern=false) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:1064
#10 0x00005558f873558b in row_purge (node=0x61f0000004a0, undo_rec=0x62500194cf60 "", thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:1108
#11 0x00005558f87358d1 in row_purge_step (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:1157
#12 0x00005558f860c887 in que_thr_step (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:946
#13 0x00005558f860ccf3 in que_run_threads_low (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:1008
#14 0x00005558f860d152 in que_run_threads (thr=0x61f0000003f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:1048
#15 0x00005558f8825d1f in trx_purge (n_tasks=4, truncate=false) at /Server/bb-10.5-MDEV-24569/storage/innobase/trx/trx0purge.cc:1300
#16 0x00005558f87bc7bc in srv_do_purge (n_total_purged=0x2338257409a0) at /Server/bb-10.5-MDEV-24569/storage/innobase/srv/srv0srv.cc:1997
#17 0x00005558f87bd281 in purge_coordinator_callback_low () at /Server/bb-10.5-MDEV-24569/storage/innobase/srv/srv0srv.cc:2087
#18 0x00005558f87bd4e5 in purge_coordinator_callback () at /Server/bb-10.5-MDEV-24569/storage/innobase/srv/srv0srv.cc:2116
#19 0x00005558f8bf77d4 in tpool::task_group::execute (this=0x5558fbf541c0 <purge_coordinator_task_group>, t=0x5558fbf54280 <purge_coordinator_task>) at /Server/bb-10.5-MDEV-24569/tpool/task_group.cc:55
#20 0x00005558f8bf80db in tpool::task::execute (this=0x5558fbf54280 <purge_coordinator_task>) at /Server/bb-10.5-MDEV-24569/tpool/task.cc:47
#21 0x00005558f8be4099 in tpool::thread_pool_generic::worker_main (this=0x618000000480, thread_var=0x63000001a580) at /Server/bb-10.5-MDEV-24569/tpool/tpool_generic.cc:546
#22 0x00005558f8bf7235 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=
    @0x6040000c0f28: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x5558f8be3eb6 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x6040000c0f20: 0x618000000480)
    at /usr/include/c++/9/bits/invoke.h:73
#23 0x00005558f8bf6fe0 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=
    @0x6040000c0f28: (void (tpool::thread_pool_generic::*)(class tpool::thread_pool_generic * const, struct tpool::worker_data *)) 0x5558f8be3eb6 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95
#24 0x00005558f8bf6e71 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x6040000c0f18) at /usr/include/c++/9/thread:244
#25 0x00005558f8bf6dbb in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x6040000c0f18) at /usr/include/c++/9/thread:251
#26 0x00005558f8bf6d20 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x6040000c0f10)
    at /usr/include/c++/9/thread:195
#27 0x00003f787144fd84 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#28 0x00003f787135f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#29 0x000032f55d54b293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
sdp:/home/mleich/RQG_O/storage/1610556037/tmp1/dev/shm/vardir/1610556037/232/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio
     after  test setup like in first comment
     DDL/DML load
     SIGKILL DB server process
     Restart with success
     # 2021-01-13T09:36:00 [3527603] INFO: Reporter 'CrashRecovery1': Executing CHECK TABLE `test`.`t6` EXTENDED.
     # 2021-01-13T09:36:05 [3527603] INFO: Reporter 'CrashRecovery1': Executing ANALYZE TABLE `test`.`t6`.
     # 2021-01-13T09:36:06 [3527603] INFO: Reporter 'CrashRecovery1': Executing OPTIMIZE TABLE `test`.`t6`.
     # 2021-01-13T09:36:15 [3527603] INFO: Reporter 'CrashRecovery1': Executing REPAIR TABLE `test`.`t6` EXTENDED.
     # 2021-01-13T09:36:15 [3527603] INFO: Reporter 'CrashRecovery1': Executing ALTER TABLE `test`.`t6` FORCE.
     # 2021-01-13T09:37:03 [3527603] WARN: Auxpid 3743269 exited with exit status 134.  <== The process which started the server under "rr" exited.
 
Some pointers may be invalid and cause the dump to abort.
# 2021-01-13T09:37:03 [3527603] | [rr 3743312 163255]Query (0x0): [rr 3743312 163257](null)[rr 3743312 163259]
# 2021-01-13T09:37:03 [3527603] | Connection ID (thread ID): 0
# 2021-01-13T09:37:03 [3527603] | [rr 3743312 163261]Status: NOT_KILLED
 
This problem is now reported as 
https://jira.mariadb.org/browse/MDEV-24589     

Comment by Marko Mäkelä [ 2021-01-14 ]

Thank you, mleich and thiru! I think that the observed failure is a separate bug, which I will analyze shortly.

It seems that I made a mistake when analyzing the trace of the crashed recovery, probably by setting the breakpoint too early. I now double-checked that indeed in log_phys_t::apply() there will be only 2 records applied to the page 606 before the crash:

_RR_TRACE_DIR=/data/Results/1610366087/CR-20/dev/shm/vardir/1610366087/50/1/rr rr replay
continue
frame 4
watch -l block.page.id_
reverse-continue
reverse-continue
break log0recv.cc:207
continue
continue
display/x *recs@*(uint16_t*)(this+1)
continue
continue
quit

The output ends as follows:

(rr) continue
Continuing.
 
Thread 1 hit Hardware watchpoint 1: -location block.page.id_
 
Old value = {m_id = 18446744073709551615}
New value = {m_id = 606}
buf_page_t::init (this=0x6400000e8360, id=..., buf_fix_count=1) at /Server/10.6E/storage/innobase/include/buf0buf.h:842
842	    buf_fix_count_= buf_fix_count;
(rr) 
Continuing.
 
Thread 1 hit Breakpoint 2, log_phys_t::apply (this=0x640000a0d3b0, block=..., last_offset=@0x608001af92e4: 1) at /Server/10.6E/storage/innobase/log/log0recv.cc:207
207	    apply_status applied= APPLIED_NO;
(rr) display/x *recs@*(uint16_t*)(this+1)
1: /x *recs@*(uint16_t*)(this+1) = {0x3, 0x0, 0x81, 0xde}
(rr) continue
Continuing.
 
Thread 1 hit Breakpoint 2, log_phys_t::apply (this=0x6400003b2740, block=..., last_offset=@0x608001af92e4: 1) at /Server/10.6E/storage/innobase/log/log0recv.cc:207
207	    apply_status applied= APPLIED_NO;
1: /x *recs@*(uint16_t*)(this+1) = {0x36, 0x0, 0x81, 0xde, 0x3e, 0x2, 0x6c}
(rr) continue
Continuing.
mysqld: /Server/10.6E/storage/innobase/log/log0recv.cc:274: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
 
Thread 1 received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

So, the log parsing is indeed working in the optimal way: discarding any preceding records for the page when an INIT_PAGE or FREE_PAGE record is being parsed. We will fail on attempting to apply the WRITE record after FREE_PAGE.

Comment by Matthias Leich [ 2021-01-14 ]

mysqld: /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:1775: void lock_rec_add_to_queue(unsigned int, const buf_block_t*, ulint, dict_index_t*, trx_t*, bool): Assertion `!other_lock' failed.
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000028411bded859 in __GI_abort () at abort.c:79
#2  0x000028411bded729 in __assert_fail_base (fmt=0x28411bf83588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55a435d77cc0 "!other_lock", file=0x55a435d758e0 "/Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc", line=1775, 
    function=<optimized out>) at assert.c:92
#3  0x000028411bdfef36 in __GI___assert_fail (assertion=0x55a435d77cc0 "!other_lock", file=0x55a435d758e0 "/Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc", line=1775, 
    function=0x55a435d77b40 "void lock_rec_add_to_queue(unsigned int, const buf_block_t*, ulint, dict_index_t*, trx_t*, bool)") at assert.c:101
#4  0x000055a4344e2a7d in lock_rec_add_to_queue (type_mode=35, block=0x4a3e2d3c1520, heap_no=2, index=0x6170003277a0, trx=0x74905a4644c0, caller_owns_trx_mutex=false) at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:1775
#5  0x000055a4344e7ad4 in lock_move_rec_list_end (new_block=0x4a3e2d3c1520, block=0x4a3e2d3c66b0, rec=0x4a3e2d7ce065 "infimum") at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:2710
#6  0x000055a4345c6a0e in page_copy_rec_list_end (new_block=0x4a3e2d3c1520, block=0x4a3e2d3c66b0, rec=0x4a3e2d7ce065 "infimum", index=0x6170003277a0, mtr=0x38c21e5e4dc0) at /Server/bb-10.5-MDEV-24569/storage/innobase/page/page0page.cc:658
#7  0x000055a4348cdc15 in btr_root_raise_and_insert (flags=0, cursor=0x38c21e5e49a0, offsets=0x38c21e5e4920, heap=0x38c21e5e4900, tuple=0x61700047ac20, n_ext=1, mtr=0x38c21e5e4dc0) at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0btr.cc:1904
#8  0x000055a43491ea43 in btr_cur_pessimistic_insert (flags=0, cursor=0x38c21e5e49a0, offsets=0x38c21e5e4920, heap=0x38c21e5e4900, entry=0x61700047ac20, rec=0x38c21e5e4940, big_rec=0x38c21e5e48e0, n_ext=1, thr=0x621000447ca0, mtr=0x38c21e5e4dc0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc:3746
#9  0x000055a434694e0e in row_ins_clust_index_entry_low (flags=0, mode=33, index=0x6170003277a0, n_uniq=0, entry=0x61700047ac20, n_ext=0, thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:2755
#10 0x000055a434697980 in row_ins_clust_index_entry (index=0x6170003277a0, entry=0x61700047ac20, thr=0x621000447ca0, n_ext=0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3225
#11 0x000055a434698154 in row_ins_index_entry (index=0x6170003277a0, entry=0x61700047ac20, thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3333
#12 0x000055a4346992c5 in row_ins_index_entry_step (node=0x6210004477c0, thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3502
#13 0x000055a434699d9e in row_ins (node=0x6210004477c0, thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3661
#14 0x000055a43469afb9 in row_ins_step (thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3807
#15 0x000055a4346e4abd in row_insert_for_mysql (mysql_rec=0x61a00086eeb8 "\320\001", prebuilt=0x6210004471a0, ins_mode=ROW_INS_NORMAL) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0mysql.cc:1421
#16 0x000055a434343582 in ha_innobase::write_row (this=0x61d000a122b8, record=0x61a00086eeb8 "\320\001") at /Server/bb-10.5-MDEV-24569/storage/innobase/handler/ha_innodb.cc:7545
#17 0x000055a433840d51 in handler::ha_write_row (this=0x61d000a122b8, buf=0x61a00086eeb8 "\320\001") at /Server/bb-10.5-MDEV-24569/sql/handler.cc:7151
#18 0x000055a432f4c43e in write_record (thd=0x62b00018f218, table=0x61900064d798, info=0x38c21e5e6820, sink=0x0) at /Server/bb-10.5-MDEV-24569/sql/sql_insert.cc:2106
#19 0x000055a432f43efb in mysql_insert (thd=0x62b00018f218, table_list=0x62b0001964b8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /Server/bb-10.5-MDEV-24569/sql/sql_insert.cc:1099
#20 0x000055a43300c790 in mysql_execute_command (thd=0x62b00018f218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:4596
#21 0x000055a433024c98 in mysql_parse (thd=0x62b00018f218, 
    rawbuf=0x62b000196238 "INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1 - 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread17 QNO 188 C"..., length=211, 
    parser_state=0x38c21e5e77c0, is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:8062
#22 0x000055a432ffaf91 in dispatch_command (command=COM_QUERY, thd=0x62b00018f218, 
    packet=0x6290007df219 " INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1 - 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread17 QNO 188 "..., packet_length=213, 
    is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1889
#23 0x000055a432ff78ba in do_command (thd=0x62b00018f218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1370
#24 0x000055a433439b71 in do_handle_one_connection (connect=0x60800000d6b8, put_in_cache=true) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1410
#25 0x000055a4334394da in handle_one_connection (arg=0x60800000d6b8) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1312
#26 0x000055a4341442eb in pfs_spawn_thread (arg=0x61500000e918) at /Server/bb-10.5-MDEV-24569/storage/perfschema/pfs.cc:2201
#27 0x00007f67f5fef609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x000028411beea293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
sdp:/home/mleich/RQG_O/storage/1610563808/tmp1/dev/shm/vardir/1610563808/173/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio
(rr)
     

Comment by Marko Mäkelä [ 2021-01-14 ]

The index ID mismatch after recovery could happen because a freed page was still attached to some other table. The key here seems to be innodb_file_per_table=0.

ssh sdp
_RR_TRACE_DIR=/home/mleich/RQG_O/storage/1610556037/tmp1/dev/shm/vardir/1610556037/232/1/rr rr replay

The PAGE_INDEX_ID was initialized here:

(rr) when
Current event: 129184
(rr) bt
#0  0x00005558f85d14a3 in mtr_t::write<8u, (mtr_t::write_type)0, unsigned long>
    (this=0x61a0000576a8, block=..., ptr=0x17994ce6a042, val=116)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/include/mtr0log.h:209
#1  0x00005558f88e2f1c in PageBulk::init (this=0x61a000057698)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0bulk.cc:101
#2  0x00005558f88e6334 in BtrBulk::insert (this=0x607000e851f8, 
    tuple=0x6310000648d0, level=0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0bulk.cc:1031
#3  0x00005558f865053a in BtrBulk::insert (this=0x607000e851f8, 
    tuple=0x6310000648d0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/include/btr0bulk.h:301
#4  0x00005558f86bf0c1 in row_merge_insert_index_tuples (index=0x61700003bca0, 
#26 0x00005558f701ec98 in mysql_parse (thd=0x62b00007e218, 
    rawbuf=0x62b000085238 "OPTIMIZE TABLE `test`.`t6`", length=26, 
    parser_state=0x50d457e7a7c0, is_com_multi=false, is_next_command=false)
    at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:8062

No earlier contents of this page 116 is available in the buffer pool. Earlier, the memory was used for buffering redo log records during recovery.

The OPTIMIZE TABLE that is rebuilding the table t6 is reusing the page 116 for a PRIMARY key page of the rebuilt table. But, the GEN_CLUST_INDEX of table E is pointing to this page in purge, even though the page was already reused by the OPTIMIZE TABLE of another table:

#6  0x00005558f87461f6 in row_search_on_row_ref (pcur=0x61f000000540, mode=2, 
    table=0x618000040920, ref=0x625001a1fdc0, mtr=0x233825740060)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0row.cc:1213
#7  0x00005558f872edfd in row_purge_reposition_pcur (mode=2, node=0x61f0000004a0, 
    mtr=0x233825740060)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0purge.cc:78

In the root page of that clustered index (page 92), we have PAGE_LEVEL=1 and we do have a record (DB_ROW_ID,page)=(69657,116).

We will have to find out when the allocation status of page 116 changed, in the allocation bitmap page (page 0 in the system tablespace):

(rr) find_page_105 0 0
$25 = (buf_block_t *) 0x17994c6655e8
(rr) display/x $25.frame[150+24+116*2/8]

At the end of the last recovery batch, these bits are 0xef. As expected, the least significant bit will be cleared (the byte will first be changed to 0xee) during our OPTIMIZE TABLE:

#0  mtr_t::write<1u, (mtr_t::write_type)0, unsigned char> (this=0x50d457e72280, 
    block=..., ptr=0x17994cccc0cb, val=238 '\356')
    at /Server/bb-10.5-MDEV-24569/storage/innobase/include/mtr0log.h:211
#1  0x00005558f8ae15af in xdes_set_free<false> (block=..., descr=0x17994cccc096 "", 
    offset=116, mtr=0x50d457e72280)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:164
#2  0x00005558f8ad3654 in fsp_alloc_from_free_frag (header=0x17994c6655e8, 
    xdes=0x17994c6655e8, descr=0x17994cccc096 "", bit=116, mtr=0x50d457e72280)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:1016
#3  0x00005558f8ad432b in fsp_alloc_free_page (space=0x615000003518, hint=0, 
    mtr=0x50d457e72280, init_mtr=0x61a0000576a8)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:1162
#4  0x00005558f8ad9fd8 in fseg_alloc_free_page_low (space=0x615000003518, 
    seg_inode=0x17994ce1e172 "", iblock=0x17994c674390, hint=0, direction=111 'o', 
    has_done_reservation=true, mtr=0x50d457e72280, init_mtr=0x61a0000576a8)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:2084
#5  0x00005558f8adb0ba in fseg_alloc_free_page_general (seg_header=0x17994c85004a "", 
    hint=0, direction=111 'o', has_done_reservation=true, mtr=0x50d457e72280, 
    init_mtr=0x61a0000576a8)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:2215
#6  0x00005558f88bb007 in btr_page_alloc_low (index=0x61700003bca0, hint_page_no=0, 
    file_direction=111 'o', level=0, mtr=0x50d457e72280, init_mtr=0x61a0000576a8)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0btr.cc:536
#7  0x00005558f88bb07f in btr_page_alloc (index=0x61700003bca0, hint_page_no=0, 
    file_direction=111 'o', level=0, mtr=0x50d457e72280, init_mtr=0x61a0000576a8)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0btr.cc:569
#8  0x00005558f88e2a02 in PageBulk::init (this=0x61a000057698)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0bulk.cc:73

I will have to check when the page was freed before the server was killed, and why it happened while table E still exists. My guess is an old problem: DDL is not actually safe in InnoDB. We would invoke btr_free() already while delete-marking the record in SYS_INDEXES, before that transaction is actually committed (to be fixed in MDEV-18518). Soon we will see if my guess is right.

Comment by Marko Mäkelä [ 2021-01-14 ]

I validated my guess. The B-tree is indeed being freed here:

Thread 1 hit Hardware watchpoint 1: -location $1.frame[150+24+116*2/8]
 
Old value = 234 '\352'
New value = 235 '\353'
mtr_t::write<1u, (mtr_t::write_type)0, unsigned char> (this=0x33fe310a3230, block=..., ptr=0x6c9d750a00cb, val=235 '\353') at /Server/bb-10.5-MDEV-24569/storage/innobase/include/mtr0log.h:211
211	             (ut_align_offset(p, srv_page_size)), p, end - p);
(rr) when
Current event: 1166976
(rr) bt
#0  0x00005650f61d59b4 in mtr_t::write<1u, (mtr_t::write_type)0, unsigned char> (this=0x33fe310a3230, block=..., ptr=0x6c9d750a00cb, val=235 '\353')
    at /Server/bb-10.5-MDEV-24569/storage/innobase/include/mtr0log.h:209
#1  0x00005650f66e62f7 in xdes_set_free<true> (block=..., descr=0x6c9d750a0096 "", offset=116, mtr=0x33fe310a3230) at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:164
#2  0x00005650f66d8c05 in fsp_free_page (space=0x615000003518, offset=116, mtr=0x33fe310a3230) at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:1227
#3  0x00005650f66e046c in fseg_free_page_low (seg_inode=0x6c9d7520c032 "", iblock=0x6c9d74f487a0, space=0x615000003518, offset=116, mtr=0x33fe310a3230)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:2468
#4  0x00005650f66e22c4 in fseg_free_step (header=0x6c9d753d804a "", mtr=0x33fe310a3230) at /Server/bb-10.5-MDEV-24569/storage/innobase/fsp/fsp0fsp.cc:2708
#5  0x00005650f64c309c in btr_free_but_not_root (block=0x6c9d74f5cb10, log_mode=MTR_LOG_ALL) at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0btr.cc:1173
#6  0x00005650f64c33ae in btr_free_if_exists (page_id=..., zip_size=0, index_id=98, mtr=0x33fe310a3d10) at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0btr.cc:1221
#7  0x00005650f65f6fd4 in dict_drop_index_tree (pcur=0x629000e9a6a8, trx=0x58176ee5b9a8, mtr=0x33fe310a3d10) at /Server/bb-10.5-MDEV-24569/storage/innobase/dict/dict0crea.cc:955
#8  0x00005650f639d5b8 in row_upd_clust_step (node=0x629000e99f40, thr=0x625000f7b7a8) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0upd.cc:2819
#9  0x00005650f639e866 in row_upd (node=0x629000e99f40, thr=0x625000f7b7a8) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0upd.cc:2992
#10 0x00005650f639f7fa in row_upd_step (thr=0x625000f7b7a8) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0upd.cc:3136
#11 0x00005650f62107a5 in que_thr_step (thr=0x625000f7b7a8) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:928
#12 0x00005650f6210cf3 in que_run_threads_low (thr=0x625000f7b7a8) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:1008
#13 0x00005650f6211152 in que_run_threads (thr=0x625000f7b7a8) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:1048
#14 0x00005650f62116cd in que_eval_sql (info=0x6170003151a0, 
    sql=0x5650f7a7f420 "PROCEDURE DROP_TABLE_PROC () IS\ntid CHAR;\niid CHAR;\nDECLARE CURSOR cur_idx IS\nSELECT ID FROM SYS_INDEXES\nWHERE TABLE_ID = tid FOR UPDATE;\nBEGIN\nSELECT ID INTO tid FROM SYS_TABLES\nWHERE NAME = :name FO"..., reserve_dict_mutex=false, trx=0x58176ee5b9a8) at /Server/bb-10.5-MDEV-24569/storage/innobase/que/que0que.cc:1125
#15 0x00005650f62f09e4 in row_drop_table_for_mysql (name=0x33fe310a56e0 "test/E", trx=0x58176ee5b9a8, sqlcom=SQLCOM_DROP_TABLE, create_failed=false, nonatomic=true)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0mysql.cc:3676
#16 0x00005650f5f97f04 in ha_innobase::delete_table (this=0x62b000142ca8, name=0x33fe310a75b0 "./test/E", sqlcom=SQLCOM_DROP_TABLE) at /Server/bb-10.5-MDEV-24569/storage/innobase/handler/ha_innodb.cc:13206
#17 0x00005650f5f62017 in ha_innobase::delete_table (this=0x62b000142ca8, name=0x33fe310a75b0 "./test/E") at /Server/bb-10.5-MDEV-24569/storage/innobase/handler/ha_innodb.cc:13333
#18 0x00005650f5408207 in hton_drop_table (hton=0x615000002898, path=0x33fe310a75b0 "./test/E") at /Server/bb-10.5-MDEV-24569/sql/handler.cc:564
#19 0x00005650f54152c5 in ha_delete_table (thd=0x62b00013b218, hton=0x615000002898, path=0x33fe310a75b0 "./test/E", db=0x33fe310a6d80, alias=0x33fe310a6da0, generate_warning=true)
    at /Server/bb-10.5-MDEV-24569/sql/handler.cc:2770
#20 0x00005650f4e797b0 in mysql_rm_table_no_locks (thd=0x62b00013b218, tables=0x62b000142368, if_exists=true, drop_temporary=false, drop_view=false, drop_sequence=false, dont_log_query=false, 
    dont_free_locks=false) at /Server/bb-10.5-MDEV-24569/sql/sql_table.cc:2509
#21 0x00005650f4e77549 in mysql_rm_table (thd=0x62b00013b218, tables=0x62b000142368, if_exists=true, drop_temporary=false, drop_sequence=false, dont_log_query=false)
    at /Server/bb-10.5-MDEV-24569/sql/sql_table.cc:2138
#22 0x00005650f4c0e086 in mysql_execute_command (thd=0x62b00013b218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:4980
#23 0x00005650f4c22c98 in mysql_parse (thd=0x62b00013b218, rawbuf=0x62b000142238 "DROP TABLE IF EXISTS E /* E_R Thread9 QNO 42 CON_ID 23 */", length=57, parser_state=0x33fe310a87c0, is_com_multi=false, 
    is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:8062
(rr) break trx_write_serialisation_history thread 1
Breakpoint 2 at 0x5650f648204a: file /Server/bb-10.5-MDEV-24569/storage/innobase/trx/trx0trx.cc, line 1053.
(rr) continue
Continuing.
 
Thread 1 hit Breakpoint 2, trx_write_serialisation_history (trx=0x5650f67aaffd <yyrestart(FILE*)+274>, mtr=0x33fe310a3ac0) at /Server/bb-10.5-MDEV-24569/storage/innobase/trx/trx0trx.cc:1053
1053	{
(rr) when
Current event: 1167020
(rr) finish
Run till exit from #0  trx_write_serialisation_history (trx=0x5650f67aaffd <yyrestart(FILE*)+274>, mtr=0x33fe310a3ac0) at /Server/bb-10.5-MDEV-24569/storage/innobase/trx/trx0trx.cc:1053
trx_t::commit_low (this=0x58176ee5b9a8, mtr=0x33fe310a40f0) at /Server/bb-10.5-MDEV-24569/storage/innobase/trx/trx0trx.cc:1543
1543	    mtr->commit();
(rr) next
1546	  if (debug_sync)
(rr) p *mtr
$6 = {m_start = true, m_commit = true, …, m_commit_lsn = 78146879, m_freed_space = 0x0, m_freed_pages = 0x0}
(rr) continue
Continuing.
Thread-specific breakpoint 2 deleted - thread 1 no longer in the thread list.
 
Thread 3 received signal SIGKILL, Killed.
[Switching to Thread 3529385.3529624]
0x0000000070000002 in ?? ()
(rr) p log_sys.lsn
$8 = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 78228899}, <No data fields>}
(rr) p log_sys.flushed_to_disk_lsn 
$9 = {<std::__atomic_base<unsigned long>> = {static _S_alignment = 8, _M_i = 78144573}, <No data fields>}

Even though the DROP TABLE IF EXISTS E happened to finish committing in memory before the SIGKILL was received, the commit was not yet made persistent at the time of the kill. The end LSN of the transaction commit was mtr->m_commit_lsn==78146879, which is a few hundred bytes ahead of log_sys.flushed_to_disk_lsn == 78144573.
We would have had a similar disastrous effect as soon as the first mini-transaction of btr_free_if_exists() had been made durable.
The correct course of action would be to start dropping the B-tree in a purge-like operation, after the DDL transaction was committed, just like I have suggested in MDEV-18518.

Comment by Marko Mäkelä [ 2021-01-14 ]

ssh sdp
_RR_TRACE_DIR=/home/mleich/RQG_O/storage/1610563808/tmp1/dev/shm/vardir/1610563808/173/1/rr rr replay

That failure should be completely unrelated to this bug. I was first suspecting that it could be fixed by MDEV-20605, but that does not seem to be the case. Here is the assertion failure and the creation of the conflicting lock shortly earlier:

Thread 3 received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	in ../sysdeps/unix/sysv/linux/raise.c
(rr) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000028411bded859 in __GI_abort () at abort.c:79
#2  0x000028411bded729 in __assert_fail_base (fmt=0x28411bf83588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55a435d77cc0 "!other_lock", 
    file=0x55a435d758e0 "/Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc", line=1775, function=<optimized out>) at assert.c:92
#3  0x000028411bdfef36 in __GI___assert_fail (assertion=0x55a435d77cc0 "!other_lock", file=0x55a435d758e0 "/Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc", line=1775, 
    function=0x55a435d77b40 "void lock_rec_add_to_queue(unsigned int, const buf_block_t*, ulint, dict_index_t*, trx_t*, bool)") at assert.c:101
#4  0x000055a4344e2a7d in lock_rec_add_to_queue (type_mode=35, block=0x4a3e2d3c1520, heap_no=2, index=0x6170003277a0, trx=0x74905a4644c0, caller_owns_trx_mutex=false)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:1775
#5  0x000055a4344e7ad4 in lock_move_rec_list_end (new_block=0x4a3e2d3c1520, block=0x4a3e2d3c66b0, rec=0x4a3e2d7ce065 "infimum") at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:2710
#6  0x000055a4345c6a0e in page_copy_rec_list_end (new_block=0x4a3e2d3c1520, block=0x4a3e2d3c66b0, rec=0x4a3e2d7ce065 "infimum", index=0x6170003277a0, mtr=0x38c21e5e4dc0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/page/page0page.cc:658
#7  0x000055a4348cdc15 in btr_root_raise_and_insert (flags=0, cursor=0x38c21e5e49a0, offsets=0x38c21e5e4920, heap=0x38c21e5e4900, tuple=0x61700047ac20, n_ext=1, mtr=0x38c21e5e4dc0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0btr.cc:1904
#8  0x000055a43491ea43 in btr_cur_pessimistic_insert (flags=0, cursor=0x38c21e5e49a0, offsets=0x38c21e5e4920, heap=0x38c21e5e4900, entry=0x61700047ac20, rec=0x38c21e5e4940, big_rec=0x38c21e5e48e0, n_ext=1, 
    thr=0x621000447ca0, mtr=0x38c21e5e4dc0) at /Server/bb-10.5-MDEV-24569/storage/innobase/btr/btr0cur.cc:3746
#9  0x000055a434694e0e in row_ins_clust_index_entry_low (flags=0, mode=33, index=0x6170003277a0, n_uniq=0, entry=0x61700047ac20, n_ext=0, thr=0x621000447ca0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:2755
#10 0x000055a434697980 in row_ins_clust_index_entry (index=0x6170003277a0, entry=0x61700047ac20, thr=0x621000447ca0, n_ext=0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3225
#11 0x000055a434698154 in row_ins_index_entry (index=0x6170003277a0, entry=0x61700047ac20, thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3333
#12 0x000055a4346992c5 in row_ins_index_entry_step (node=0x6210004477c0, thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3502
#13 0x000055a434699d9e in row_ins (node=0x6210004477c0, thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3661
#14 0x000055a43469afb9 in row_ins_step (thr=0x621000447ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3807
#15 0x000055a4346e4abd in row_insert_for_mysql (mysql_rec=0x61a00086eeb8 "\320\001", prebuilt=0x6210004471a0, ins_mode=ROW_INS_NORMAL) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0mysql.cc:1421
#16 0x000055a434343582 in ha_innobase::write_row (this=0x61d000a122b8, record=0x61a00086eeb8 "\320\001") at /Server/bb-10.5-MDEV-24569/storage/innobase/handler/ha_innodb.cc:7545
#17 0x000055a433840d51 in handler::ha_write_row (this=0x61d000a122b8, buf=0x61a00086eeb8 "\320\001") at /Server/bb-10.5-MDEV-24569/sql/handler.cc:7151
#18 0x000055a432f4c43e in write_record (thd=0x62b00018f218, table=0x61900064d798, info=0x38c21e5e6820, sink=0x0) at /Server/bb-10.5-MDEV-24569/sql/sql_insert.cc:2106
#19 0x000055a432f43efb in mysql_insert (thd=0x62b00018f218, table_list=0x62b0001964b8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0)
    at /Server/bb-10.5-MDEV-24569/sql/sql_insert.cc:1099
#20 0x000055a43300c790 in mysql_execute_command (thd=0x62b00018f218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:4596
#21 0x000055a433024c98 in mysql_parse (thd=0x62b00018f218, 
    rawbuf=0x62b000196238 "INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1 - 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread17 QNO 188 C"..., length=211, parser_state=0x38c21e5e77c0, is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:8062
#22 0x000055a432ffaf91 in dispatch_command (command=COM_QUERY, thd=0x62b00018f218, 
    packet=0x6290007df219 " INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1 - 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread17 QNO 188 "..., packet_length=213, is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1889
#23 0x000055a432ff78ba in do_command (thd=0x62b00018f218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1370
#24 0x000055a433439b71 in do_handle_one_connection (connect=0x60800000d6b8, put_in_cache=true) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1410
#25 0x000055a4334394da in handle_one_connection (arg=0x60800000d6b8) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1312
#26 0x000055a4341442eb in pfs_spawn_thread (arg=0x61500000e918) at /Server/bb-10.5-MDEV-24569/storage/perfschema/pfs.cc:2201
#27 0x00007f67f5fef609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x000028411beea293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) reverse-continue
Continuing.
[Switching to Thread 1745825.1798278]
 
Thread 42 hit Breakpoint 4, lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=2, page_id=..., page=0x4a3e2d75a000 "", heap_no=2, index=0x617000536e20, trx=0x74905a450810, holds_trx_mutex=false)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:1332
1332		lock->trx = trx;
Current event: 361529
(rr) thread apply 3 backtrace
 
Thread 3 (Thread 1745825.1797112):
#0  0x0000000070000002 in ?? ()
#1  0x00005cd3326bf3b6 in _raw_syscall () at /build/rr-S0CLEN/rr-5.3.0/src/preload/raw_syscall.S:120
#2  0x00005cd3326bb2ff in traced_raw_syscall (call=call@entry=0x7f67f06aefa0) at ./src/preload/syscallbuf.c:229
#3  0x00005cd3326be37e in sys_futex (call=<optimized out>) at ./src/preload/syscallbuf.c:1355
#4  syscall_hook_internal (call=0x7f67f06aefa0) at ./src/preload/syscallbuf.c:2861
#5  syscall_hook (call=0x7f67f06aefa0) at ./src/preload/syscallbuf.c:2987
#6  0x00005cd3326bb1da in _syscall_hook_trampoline () at /build/rr-S0CLEN/rr-5.3.0/src/preload/syscall_hook.S:282
#7  0x00005cd3326bb20a in __morestack () at /build/rr-S0CLEN/rr-5.3.0/src/preload/syscall_hook.S:417
#8  0x00005cd3326bb225 in _syscall_hook_trampoline_48_3d_00_f0_ff_ff () at /build/rr-S0CLEN/rr-5.3.0/src/preload/syscall_hook.S:428
#9  0x00007f67f5ff67b7 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x38c21e5e57a0, clockid=<optimized out>, expected=0, futex_word=0x62b00018f414) at ../sysdeps/nptl/futex-internal.h:320
#10 __pthread_cond_wait_common (abstime=0x38c21e5e57a0, clockid=<optimized out>, mutex=0x62b00018f360, cond=0x62b00018f3e8) at pthread_cond_wait.c:520
#11 __pthread_cond_timedwait (cond=0x62b00018f3e8, mutex=0x62b00018f360, abstime=0x38c21e5e57a0) at pthread_cond_wait.c:656
#12 0x000055a434dd045d in safe_cond_timedwait (cond=0x62b00018f3e8, mp=0x62b00018f338, abstime=0x38c21e5e57a0, file=0x55a43546d340 "/Server/bb-10.5-MDEV-24569/include/mysql/psi/mysql_thread.h", line=1270) at /Server/bb-10.5-MDEV-24569/mysys/thr_mutex.c:546
#13 0x000055a43345bdb9 in inline_mysql_cond_timedwait (that=0x62b00018f3e8, mutex=0x62b00018f338, abstime=0x38c21e5e57a0, src_file=0x55a435471200 "/Server/bb-10.5-MDEV-24569/sql/mdl.cc", src_line=1166) at /Server/bb-10.5-MDEV-24569/include/mysql/psi/mysql_thread.h:1270
#14 0x000055a43345e72a in MDL_wait::timed_wait (this=0x62b00018f338, owner=0x62b00018f2e8, abs_timeout=0x38c21e5e57a0, set_status_on_timeout=false, wait_state_name=0x55a436a87c70 <MDL_key::m_namespace_to_wait_state_name+48>) at /Server/bb-10.5-MDEV-24569/sql/mdl.cc:1166
#15 0x000055a433462d00 in MDL_context::acquire_lock (this=0x62b00018f338, mdl_request=0x62b000196900, lock_wait_timeout=2) at /Server/bb-10.5-MDEV-24569/sql/mdl.cc:2332
#16 0x000055a432e62030 in open_table_get_mdl_lock (thd=0x62b00018f218, ot_ctx=0x38c21e5e62b0, mdl_request=0x62b000196900, flags=0, mdl_ticket=0x38c21e5e5c90) at /Server/bb-10.5-MDEV-24569/sql/sql_base.cc:1546
#17 0x000055a432e6390d in open_table (thd=0x62b00018f218, table_list=0x62b0001964b8, ot_ctx=0x38c21e5e62b0) at /Server/bb-10.5-MDEV-24569/sql/sql_base.cc:1815
#18 0x000055a432e6ea85 in open_and_process_table (thd=0x62b00018f218, tables=0x62b0001964b8, counter=0x38c21e5e63e0, flags=0, prelocking_strategy=0x38c21e5e6510, has_prelocking_list=false, ot_ctx=0x38c21e5e62b0) at /Server/bb-10.5-MDEV-24569/sql/sql_base.cc:3801
#19 0x000055a432e7160c in open_tables (thd=0x62b00018f218, options=..., start=0x38c21e5e63f0, counter=0x38c21e5e63e0, flags=0, prelocking_strategy=0x38c21e5e6510) at /Server/bb-10.5-MDEV-24569/sql/sql_base.cc:4275
#20 0x000055a432e76750 in open_and_lock_tables (thd=0x62b00018f218, options=..., tables=0x62b0001964b8, derived=true, flags=0, prelocking_strategy=0x38c21e5e6510) at /Server/bb-10.5-MDEV-24569/sql/sql_base.cc:5211
#21 0x000055a432dcd964 in open_and_lock_tables (thd=0x62b00018f218, tables=0x62b0001964b8, derived=true, flags=0) at /Server/bb-10.5-MDEV-24569/sql/sql_base.h:507
#22 0x000055a432f41bf4 in mysql_insert (thd=0x62b00018f218, table_list=0x62b0001964b8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /Server/bb-10.5-MDEV-24569/sql/sql_insert.cc:756
#23 0x000055a43300c790 in mysql_execute_command (thd=0x62b00018f218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:4596
#24 0x000055a433024c98 in mysql_parse (thd=0x62b00018f218, rawbuf=0x62b000196238 "INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1 - 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread17 QNO 188 C"..., length=211, parser_state=0x38c21e5e77c0, is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:8062
#25 0x000055a432ffaf91 in dispatch_command (command=COM_QUERY, thd=0x62b00018f218, packet=0x6290007df219 " INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1 - 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread17 QNO 188 "..., packet_length=213, is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1889
#26 0x000055a432ff78ba in do_command (thd=0x62b00018f218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1370
#27 0x000055a433439b71 in do_handle_one_connection (connect=0x60800000d6b8, put_in_cache=true) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1410
#28 0x000055a4334394da in handle_one_connection (arg=0x60800000d6b8) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1312
#29 0x000055a4341442eb in pfs_spawn_thread (arg=0x61500000e918) at /Server/bb-10.5-MDEV-24569/storage/perfschema/pfs.cc:2201
#30 0x00007f67f5fef609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#31 0x000028411beea293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) backtrace
#0  lock_rec_create_low (c_lock=0x0, thr=0x0, type_mode=2, page_id=..., page=0x4a3e2d75a000 "", heap_no=2, index=0x617000536e20, trx=0x74905a450810, holds_trx_mutex=false)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:1332
#1  0x000055a4344d5e81 in lock_rec_create (c_lock=0x0, thr=0x0, type_mode=2, block=0x4a3e2d3c1520, heap_no=2, index=0x617000536e20, trx=0x74905a450810, caller_owns_trx_mutex=false)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/include/lock0lock.ic:97
#2  0x000055a4344e3501 in lock_rec_lock (impl=false, mode=2, block=0x4a3e2d3c1520, heap_no=2, index=0x617000536e20, thr=0x621000122ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:1938
#3  0x000055a4344faa9b in lock_sec_rec_read_check_and_lock (flags=0, block=0x4a3e2d3c1520, rec=0x4a3e2d75a085 "\200", index=0x617000536e20, offsets=0x3bb61b227eb0, mode=LOCK_S, gap_mode=0, thr=0x621000122ca0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:5597
#4  0x000055a43468e52d in row_ins_set_shared_rec_lock (type=0, block=0x4a3e2d3c1520, rec=0x4a3e2d75a085 "\200", index=0x617000536e20, offsets=0x3bb61b227eb0, thr=0x621000122ca0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:1402
#5  0x000055a4346916e3 in row_ins_scan_sec_index_for_duplicate (flags=0, index=0x617000536e20, entry=0x6170000c5038, thr=0x621000122ca0, s_latch=true, mtr=0x3bb61b228660, offsets_heap=0x61a0005bbc98)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:2146
#6  0x000055a4346965b3 in row_ins_sec_index_entry_low (flags=0, mode=2, index=0x617000536e20, offsets_heap=0x61a0005bbc98, heap=0x61a0005bb698, entry=0x6170000c5038, trx_id=0, thr=0x621000122ca0)
    at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3015
#7  0x000055a434697dfd in row_ins_sec_index_entry (index=0x617000536e20, entry=0x6170000c5038, thr=0x621000122ca0, check_foreign=true) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3288
#8  0x000055a43469816f in row_ins_index_entry (index=0x617000536e20, entry=0x6170000c5038, thr=0x621000122ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3335
#9  0x000055a4346992c5 in row_ins_index_entry_step (node=0x6210001227c0, thr=0x621000122ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3502
#10 0x000055a434699d9e in row_ins (node=0x6210001227c0, thr=0x621000122ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3661
#11 0x000055a43469afb9 in row_ins_step (thr=0x621000122ca0) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0ins.cc:3807
#12 0x000055a4346e4abd in row_insert_for_mysql (mysql_rec=0x61a0005bc2b8 "\320\001", prebuilt=0x6210001221a0, ins_mode=ROW_INS_NORMAL) at /Server/bb-10.5-MDEV-24569/storage/innobase/row/row0mysql.cc:1421
#13 0x000055a434343582 in ha_innobase::write_row (this=0x61d0009a38b8, record=0x61a0005bc2b8 "\320\001") at /Server/bb-10.5-MDEV-24569/storage/innobase/handler/ha_innodb.cc:7545
#14 0x000055a433840d51 in handler::ha_write_row (this=0x61d0009a38b8, buf=0x61a0005bc2b8 "\320\001") at /Server/bb-10.5-MDEV-24569/sql/handler.cc:7151
#15 0x000055a432f4c43e in write_record (thd=0x62b00028b218, table=0x619000254498, info=0x3bb61b22a820, sink=0x0) at /Server/bb-10.5-MDEV-24569/sql/sql_insert.cc:2106
#16 0x000055a432f43efb in mysql_insert (thd=0x62b00028b218, table_list=0x62b0002924a8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0)
    at /Server/bb-10.5-MDEV-24569/sql/sql_insert.cc:1099
#17 0x000055a43300c790 in mysql_execute_command (thd=0x62b00028b218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:4596
#18 0x000055a433024c98 in mysql_parse (thd=0x62b00028b218, 
    rawbuf=0x62b000292238 "INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread9 QNO 205 CON_ID"..., length=206, parser_state=0x3bb61b22b7c0, is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:8062
#19 0x000055a432ffaf91 in dispatch_command (command=COM_QUERY, thd=0x62b00028b218, 
    packet=0x629000e6f219 " INSERT INTO t4 (col1,col2, col_int, col_string, col_text) VALUES ( 1, 1, 1, REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), 10), REPEAT(SUBSTR(CAST( 1 AS CHAR),1,1), @fill_amount) ) /* E_R Thread9 QNO 205 CON_I"..., packet_length=208, is_com_multi=false, is_next_command=false) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1889
#20 0x000055a432ff78ba in do_command (thd=0x62b00028b218) at /Server/bb-10.5-MDEV-24569/sql/sql_parse.cc:1370
#21 0x000055a433439b71 in do_handle_one_connection (connect=0x60800000e738, put_in_cache=true) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1410
#22 0x000055a4334394da in handle_one_connection (arg=0x60800000dfb8) at /Server/bb-10.5-MDEV-24569/sql/sql_connect.cc:1312
#23 0x000055a4341442eb in pfs_spawn_thread (arg=0x615000021798) at /Server/bb-10.5-MDEV-24569/storage/perfschema/pfs.cc:2201
#24 0x00007f67f5fef609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#25 0x000028411beea293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

mleich, please file a separate report for the !other_lock assertion failure, and try to shrink the RQG grammar, if possible. As you can see, the conflict involves two INSERT INTO t4 (the same table).

Comment by Marko Mäkelä [ 2021-01-14 ]

On the merge to 10.6, I discarded the change to replace the fil_space_t::latch S-latch with SX-latch, because MDEV-24167 replaced that with a plain RW-lock. In 10.6, we will avoid the read-latch acquisition if the current thread is already holding the fil_space_t::latch in exclusive mode.

Comment by Matthias Leich [ 2021-01-14 ]

The
mysqld: /Server/bb-10.5-MDEV-24569/storage/innobase/lock/lock0lock.cc:1775: void lock_rec_add_to_queue(unsigned int, const buf_block_t*, ulint, dict_index_t*, trx_t*, bool): Assertion `!other_lock' failed.
is now https://jira.mariadb.org/browse/MDEV-24590

Generated at Thu Feb 08 09:30:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.