[MDEV-16232] Use fewer mini-transactions Created: 2018-05-21  Updated: 2024-01-16

Status: Stalled
Project: MariaDB Server
Component/s: Locking, Optimizer, Storage Engine - InnoDB
Fix Version/s: 11.5

Type: New Feature Priority: Critical
Reporter: Marko Mäkelä Assignee: Thirunarayanan Balathandayuthapani
Resolution: Unresolved Votes: 5
Labels: performance

Attachments: File psergey-mdev16232-poc-r1.diff    
Issue Links:
Blocks
blocks MDEV-16402 Support Index Condition Pushdown for ... Confirmed
blocks MDEV-21452 Use condition variables and normal mu... Closed
blocks MDEV-30078 SQL Layer support for: Use fewer Inno... Stalled
Relates
relates to MDEV-17603 Allow statement-based replication for... Closed
relates to MDEV-21974 InnoDB DML under backup locks make bu... Open
relates to MDEV-24813 Locking full table scan fails to use ... Confirmed
relates to MDEV-33251 Redundant check on prebuilt::n_rows_f... Closed
relates to MDEV-10962 Deadlock with 3 concurrent DELETEs by... Closed
relates to MDEV-11215 Several locks taken to same record in... Stalled
relates to MDEV-14425 Change the InnoDB redo log format to ... Closed
relates to MDEV-16168 Performance regression on sysbench wr... Closed
relates to MDEV-16675 Unnecessary explicit lock acquisition... Closed
relates to MDEV-18746 Reduce the amount of mem_heap_create(... Open
relates to MDEV-22413 Server hangs upon UPDATE/DELETE on a ... Closed
relates to MDEV-24224 Gap lock on delete in 10.5 using READ... Closed
relates to MDEV-26779 reduce lock_sys.wait_mutex contention... Closed
relates to MDEV-30835 Inconsistent blocking of UPDATE and D... Open

 Description   

Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW-locks across handler API calls. (Until that version, btr_search_latch for the adaptive hash index could be held, and there was a special call handlerton::release_temporary_latches.)

During UPDATE operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini-transaction and to protect the current page with the page latch (buf_block_t::lock) across calls:

  1. Introduce row_prebuilt_t::mtr and keep it open.
  2. Avoid mtr_t::commit() between row reads
  3. Avoid storing & restoring btr_pcur_t position
  4. If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then btr_pcur_store_position(); mtr.commit() will have to be called before the wait and mtr.start();btr_pcur_restore_position(); after it.

This change could remove any benefit of the row_prebuilt_t::fetch_cache (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables.

Mini-transactions for single-row UPDATE/DELETE

  1. Search and S-latch the PRIMARY KEY leaf page (get explicit transactional lock)
  2. X-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction
  3. Write undo log record
  4. Modify the PRIMARY KEY index
  5. (For each off-page column, use 1 mini-transaction per page written.)
  6. (For each secondary index, modify the index.)
  7. Commit the user transaction

There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini-transactions, because the first two writes were merged.)

We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini-transactions will be needed:

  1. Search and X-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with implicit transactional locking)
  2. (For each off-page column, use 1 mini-transaction per page written.)
  3. (For each secondary index, modify the index.)
  4. Commit the user transaction

If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single-row user transaction.)

The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for UPDATE and DELETE. When there is no locking conflict, this should greatly reduce the contention on lock_sys.mutex.

Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in log_sys.mutex or whatever MDEV-14425 will be replacing it with.

Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini-transaction.



 Comments   
Comment by Sergei Petrunia [ 2019-04-03 ]

Notes from discussion with marko at the optimizer call:

  • It seems to feasible to add something like handler->start_scan() / handler->interrupt_scan() function
  • However, they will be called in a tight loop in the nested-loops join, adding overhead.
  • When exactly should the scan be "interrupted"? If we send a row to the client, does that constitute scan interruption? (sending a row to the client may be as easy as copying it to another buffer, or a syscall + waiting for the network buffer to flush?)
  • If the query runs a join between innodb tables (the most common scenario), perhaps we don't need any extra calls. doing a row read on table X should automatically terminate a mini-transaction on table Y.
  • We can easily throw together a test version which will not handle special cases (slow network clients, subqueries in the WHERE condition doing expensive computations, etc etc). It will allow to check the speedup in the best case.
Comment by Sergei Petrunia [ 2019-04-10 ]

attached psergey-mdev16232-poc-r1.diff , which is the SQL-level part of the proof-of-concept prototype.

The patch adds functions into handler:

+  virtual void start_operations_batch();
+  virtual void end_operations_batch();


InnoDB doesn't have to release mini-transaction between these calls (as there won't be any other operations in this time). The patch is not complete at all, but it is expected to handle sysbench workloads

Comment by Marko Mäkelä [ 2021-10-27 ]

I think that we should start work on this fairly soon, because it could significantly improve DELETE and UPDATE performance and also reduce the memory footprint of partitioned tables.

Comment by Thirunarayanan Balathandayuthapani [ 2022-05-26 ]

Patch is in 10.6-MDEV-16232. Apply the following patch:

diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 44cc1ce7d35..6904a081bda 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -21183,14 +21183,12 @@ sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
   if (pfs_batch_update)
     join_tab->table->file->start_psi_batch_mode();
 
-#if 0
   bool ops_batch_started= false;
   if (join_tab->next_select == end_send)
   {
     join_tab->table->file->start_operations_batch();
     ops_batch_started= true;
   }
-#endif
 
   if (rc != NESTED_LOOP_NO_MORE_ROWS)
   {
@@ -21239,10 +21237,8 @@ sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
     rc= evaluate_join_record(join, join_tab, error);
   }
 
-#if 0
   if (ops_batch_started)
     join_tab->table->file->end_operations_batch();
-#endif
 
   if (rc == NESTED_LOOP_NO_MORE_ROWS &&
       join_tab->last_inner && !join_tab->found)

Test case is:

-- source include/have_innodb.inc
 
create table t1(a int)engine=innodb;
insert into t1 values (1),(2);
select * from t1 where a=(select a from t1 where a > 1);
drop table t1;

Above select does call start_operations_batch() twice and both for the same table.
Both tries to s-lock the clustered index root page of table t1.
start_operations_batch has been called for same select twice.

Thread 20 "mariadbd" hit Breakpoint 2, ha_innobase::start_operations_batch (this=0x7fff94223f90) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
20648     ut_ad(!m_prebuilt->batch_mtr);
(gdb) where
#0  ha_innobase::start_operations_batch (this=0x7fff94223f90) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
#1  0x0000555555fd2d81 in sub_select (join=0x7fff94018870, join_tab=0x7fff94059260, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21189
#2  0x0000555555fd2266 in do_select (join=0x7fff94018870, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736
#3  0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786
#4  0x0000555555fa4afb in JOIN::exec (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564
#5  0x0000555555fa6441 in mysql_select (thd=0x7fff94000d78, tables=0x7fff94015ab0, fields=@0x7fff94015778: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fff94015a68, last = 0x7fff94015a68, elements = 1}, <No data fields>}, conds=0x7fff94017d48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff94018848, unit=0x7fff94005198, select_lex=0x7fff940154d8) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:5044
#6  0x0000555555f956e8 in handle_select (thd=0x7fff94000d78, lex=0x7fff940050c0, result=0x7fff94018848, setup_tables_done_option=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:578
#7  0x0000555555f3e1b4 in execute_sqlcom_select (thd=0x7fff94000d78, all_tables=0x7fff94015ab0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:6260
#8  0x0000555555f35383 in mysql_execute_command (thd=0x7fff94000d78, is_called_from_prepared_stmt=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:3944
#9  0x0000555555f42f77 in mysql_parse (thd=0x7fff94000d78, rawbuf=0x7fff94015410 "select * from t1 where a=(select a from t1 where a > 1)", length=55, parser_state=0x7fffe8259500) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:8036
#10 0x0000555555f2f487 in dispatch_command (command=COM_QUERY, thd=0x7fff94000d78, packet=0x7fff9400b999 "", packet_length=55, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1894
#11 0x0000555555f2dd95 in do_command (thd=0x7fff94000d78, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1407
#12 0x00005555561032b9 in do_handle_one_connection (connect=0x555558cdea68, put_in_cache=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1418
#13 0x0000555556102f54 in handle_one_connection (arg=0x555558dc2d38) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1312
#14 0x00005555565f82e7 in pfs_spawn_thread (arg=0x555558cde5d8) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/perfschema/pfs.cc:2201
#15 0x00007ffff70b36db in start_thread (arg=0x7fffe825a700) at pthread_create.c:463
#16 0x00007ffff621561f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
 
Thread 20 "mariadbd" hit Breakpoint 2, ha_innobase::start_operations_batch (this=0x7fff94053b60) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
20648     ut_ad(!m_prebuilt->batch_mtr);
(gdb) where
#0  ha_innobase::start_operations_batch (this=0x7fff94053b60) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
#1  0x0000555555fd2d81 in sub_select (join=0x7fff940551d0, join_tab=0x7fff94056990, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21189
#2  0x0000555555fd2266 in do_select (join=0x7fff940551d0, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736
#3  0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff940551d0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786
#4  0x0000555555fa4afb in JOIN::exec (this=0x7fff940551d0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564
#5  0x0000555556400a82 in subselect_single_select_engine::exec (this=0x7fff94017d08) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:4144
#6  0x00005555563f3895 in Item_subselect::exec (this=0x7fff94017b80) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:854
#7  0x00005555563f592f in Item_singlerow_subselect::val_int (this=0x7fff94017b80) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:1504
#8  0x000055555633c549 in Arg_comparator::compare_int_signed (this=0x7fff94059a70) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.cc:933
#9  0x0000555556353b9e in Arg_comparator::compare (this=0x7fff94059a70) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.h:103
#10 0x000055555633ee72 in Item_func_eq::val_int (this=0x7fff940599c0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.cc:1762
#11 0x0000555555fd3248 in evaluate_join_record (join=0x7fff94018870, join_tab=0x7fff94059260, error=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21299
#12 0x0000555555fd2e01 in sub_select (join=0x7fff94018870, join_tab=0x7fff94059260, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21198
#13 0x0000555555fd2266 in do_select (join=0x7fff94018870, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736
#14 0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786
#15 0x0000555555fa4afb in JOIN::exec (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564
#16 0x0000555555fa6441 in mysql_select (thd=0x7fff94000d78, tables=0x7fff94015ab0, fields=@0x7fff94015778: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fff94015a68, last = 0x7fff94015a68, elements = 1}, <No data fields>}, conds=0x7fff94017d48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff94018848, unit=0x7fff94005198, select_lex=0x7fff940154d8) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:5044
#17 0x0000555555f956e8 in handle_select (thd=0x7fff94000d78, lex=0x7fff940050c0, result=0x7fff94018848, setup_tables_done_option=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:578
#18 0x0000555555f3e1b4 in execute_sqlcom_select (thd=0x7fff94000d78, all_tables=0x7fff94015ab0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:6260
#19 0x0000555555f35383 in mysql_execute_command (thd=0x7fff94000d78, is_called_from_prepared_stmt=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:3944
#20 0x0000555555f42f77 in mysql_parse (thd=0x7fff94000d78, rawbuf=0x7fff94015410 "select * from t1 where a=(select a from t1 where a > 1)", length=55, parser_state=0x7fffe8259500) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:8036
#21 0x0000555555f2f487 in dispatch_command (command=COM_QUERY, thd=0x7fff94000d78, packet=0x7fff9400b999 "", packet_length=55, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1894
#22 0x0000555555f2dd95 in do_command (thd=0x7fff94000d78, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1407
#23 0x00005555561032b9 in do_handle_one_connection (connect=0x555558cdea68, put_in_cache=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1418
#24 0x0000555556102f54 in handle_one_connection (arg=0x555558dc2d38) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1312
#25 0x00005555565f82e7 in pfs_spawn_thread (arg=0x555558cde5d8) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/perfschema/pfs.cc:2201
#26 0x00007ffff70b36db in start_thread (arg=0x7fffe825a700) at pthread_create.c:463
#27 0x00007ffff621561f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Sergei Petrunia [ 2022-06-08 ]

Pushed this:

This is a prototype-quality patch for the SQL layer which should unblock the InnoDB team.

commit 92665b862b479f6582e3a15247bfc812e59e7e34 (HEAD -> 10.9-MDEV-16232, origin/10.9-MDEV-16232)
Author: Sergei Petrunia <sergey@mariadb.com>
Date:   Wed Jun 8 15:43:06 2022 +0300
 
    MDEV-16232: Use fewer mini-transactions
    
    SQL layer support part #2. The main idea is:
    employ this optimization only when the query is using just one table.
    (the scope is too limited but the goal is to get sysbench to work).
    
    Make UPDATE, DELETE, and SELECT codepaths check that the query has just
    one table before the optimization is employed.
    
    Also: in DELETE code, the start_operations_batch/end_operations_batch
    were in the "Direct DELETE" branch. I think it's a mistake as InnoDB
    doesn't use Direct DELETE optimization.
    Removed the calls from there and added them to the regular DELETE codepath.

Comment by Marko Mäkelä [ 2022-10-31 ]

I created a branch bb-10.5-MDEV-16232 to let a user confirm whether their observation of increased memory usage is caused by the row prefetch cache. The cache would be removed as part of this task.

Comment by Sergei Petrunia [ 2022-11-23 ]

(a note so that this questoin is not forgotten):
Why does this patch:

commit eb63f8b9a56fe673e5bf8add20f6b457c7032e69
Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
Date:   Thu May 26 16:26:23 2022 +0530
 
    MDEV-16232      Use fewer mini-transactions
    
    - For update operation, InnoDB uses separate mtr for the following:
    ...

add implementation of start_operations_batch() and end_operations_batch() for every engine? Won't it be easier if there was a default implementation, handler::start_operations_batch() which did nothing?

Comment by Marko Mäkelä [ 2022-12-29 ]

Would terminology like opening or closing a page cursor be acceptable?

I have the impression that system versioning is mostly performing its special handling of UPDATE and DELETE inside InnoDB. I do not foresee a problem with that.

For a read that is executed as part of INSERT…SELECT, CREATE…SELECT or even ALTER TABLE…ALGORITHM=COPY (in copy_data_between_tables()), it would be good if the SQL layer fetched multiple rows in a batch, before ending the batch and inserting the rows.

An infinite loop in INSERT INTO t1 SELECT FROM t1 might be avoided by the fact that it is using a locking read. (Related to that, see also MDEV-24813.) Perhaps locking reads will ignore records that were inserted by the current transaction. This optimization should not affect that.

Comment by Thirunarayanan Balathandayuthapani [ 2023-03-07 ]

--source include/have_innodb.inc
create table t1 (v3 int primary key, v2 text(100) unique not null) engine=innodb;
insert into t1 values ( -32768 , -128) , (-1 , 44);
update t1 set v2 = 1;
drop table t1;

Above test case fails in 10.11-MDEV-16232 branch.

In update_stmt(), InnoDB do start batch mini-transaction and holds the lock on clustered
index record and secondary index records. Since this table has long unique blob
(table->s->long_unique_table), we do clone the current handler and it leads
to different prebuilt.

(rr) where
#0  0x0000563635d95fc5 in ha_innobase::open (this=0x7f66240163a0, name=0x7f66240919a8 "test/t1") at ../storage/innobase/handler/ha_innodb.cc:5959
#1  0x000056363595c6ca in handler::ha_open (this=0x7f66240163a0, table_arg=0x7f66241e0238, name=0x7f66240919a8 "test/t1", mode=33, test_if_locked=2, mem_root=0x7f6624006d78, partitions_to_open=0x0) at ../sql/handler.cc:3349
#2  0x000056363595bcb1 in handler::clone (this=0x7f6624238ef0, name=0x7f66240919a8 "test/t1", mem_root=0x7f6624006d78) at ../sql/handler.cc:3209
#3  0x0000563635d96ce2 in ha_innobase::clone (this=0x7f6624238ef0, name=0x7f66241dfff0 "./test/t1", mem_root=0x7f6624006d78) at ../storage/innobase/handler/ha_innodb.cc:6237
#4  0x000056363595bd4d in handler::create_lookup_handler (this=0x7f6624238ef0) at ../sql/handler.cc:3232
#5  0x000056363596af02 in handler::prepare_for_insert (this=0x7f6624238ef0, do_create=true) at ../sql/handler.cc:7543
#6  0x00005636356dc24a in mysql_update (thd=0x7f6624000db8, table_list=0x7f6624015580, fields=@0x7f6624005cc8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7f6624015e50, last = 0x7f6624015e50, elements = 1}, <No data fields>}, values=@0x7f6624006100: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7f6624015e60, last = 0x7f6624015e60, elements = 1}, <No data fields>}, conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7f6684058de0, updated_return=0x7f6684058ed0) at ../sql/sql_update.cc:1033
#7  0x00005636355833b1 in mysql_execute_command (thd=0x7f6624000db8, is_called_from_prepared_stmt=false) at ../sql/sql_parse.cc:4410
#8  0x000056363558f60a in mysql_parse (thd=0x7f6624000db8, rawbuf=0x7f66240154b0 "update t1 set v2 = 1", length=20, parser_state=0x7f6684059390) at ../sql/sql_parse.cc:8000
#9  0x000056363557ba9d in dispatch_command (command=COM_QUERY, thd=0x7f6624000db8, packet=0x7f662400ba39 "update t1 set v2 = 1", packet_length=20, blocking=true) at ../sql/sql_parse.cc:1894
#10 0x000056363557a45c in do_command (thd=0x7f6624000db8, blocking=true) at ../sql/sql_parse.cc:1407
#11 0x000056363575b9b5 in do_handle_one_connection (connect=0x5636390913d8, put_in_cache=true) at ../sql/sql_connect.cc:1416
#12 0x000056363575b716 in handle_one_connection (arg=0x563639175b18) at ../sql/sql_connect.cc:1318
#13 0x0000563635cb72b5 in pfs_spawn_thread (arg=0x56363913d9e8) at ../storage/perfschema/pfs.cc:2201
#14 0x00007f669bceb609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f669b8bc163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

After creating the new record, we do lookup in unique index with the cloned prebuilt:

gdb) where
#0  row_search_mvcc (buf=0x555556c08bee <_my_thread_var+24> "]\303\363\017\036\372UH\211\345H\203\354\020H\211}\370H\215\005i9u\001\213", mode=32767, prebuilt=0x1000056c08d2a, match_mode=140736959523632, direction=0) at ../storage/innobase/row/row0sel.cc:4347
#1  0x00005555567667a2 in ha_innobase::index_read (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key_ptr=0x7fffe07ab63f "\364\001", key_len=8, find_flag=HA_READ_KEY_EXACT) at ../storage/innobase/handler/ha_innodb.cc:9004
#2  0x000055555624da34 in handler::index_read_map (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key=0x7fffe07ab63f "\364\001", keypart_map=18446744073709551615, find_flag=HA_READ_KEY_EXACT) at ../sql/handler.h:3861
#3  0x000055555632b23a in handler::ha_index_read_map (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key=0x7fffe07ab63f "\364\001", keypart_map=18446744073709551615, find_flag=HA_READ_KEY_EXACT) at ../sql/handler.cc:3488
#4  0x000055555633731d in handler::check_duplicate_long_entry_key (this=0x7fffb023a560, new_rec=0x7fffb02292e8 "", key_no=1) at ../sql/handler.cc:7248
#5  0x0000555556337c8a in handler::check_duplicate_long_entries_update (this=0x7fffb023a560, new_rec=0x7fffb02292e8 "") at ../sql/handler.cc:7381
#6  0x0000555556339033 in handler::ha_update_row (this=0x7fffb023a560, old_data=0x7fffb0229300 "", new_data=0x7fffb02292e8 "") at ../sql/handler.cc:7664
#7  0x00005555560b100a in mysql_update (thd=0x7fffb0000db8, table_list=0x7fffb0015560, fields=@0x7fffb0005ca8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffb0015e20, last = 0x7fffb0015e20, elements = 1}, <No data fields>}, values=@0x7fffb0006100: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffb0015e30, last = 0x7fffb0015e30, elements = 1}, <No data fields>}, conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7fffe07abe30, updated_return=0x7fffe07abf20) at ../sql/sql_update.cc:1092
#8  0x0000555555f5ae92 in mysql_execute_command (thd=0x7fffb0000db8, is_called_from_prepared_stmt=false) at ../sql/sql_parse.cc:4411
#9  0x0000555555f67119 in mysql_parse (thd=0x7fffb0000db8, rawbuf=0x7fffb0015490 "update t1 set v2 = 1", length=20, parser_state=0x7fffe07ac3a0) at ../sql/sql_parse.cc:8019
#10 0x0000555555f5359a in dispatch_command (command=COM_QUERY, thd=0x7fffb0000db8, packet=0x7fffb000b9f9 "update t1 set v2 = 1", packet_length=20, blocking=true) at ../sql/sql_parse.cc:1894
#11 0x0000555555f51f59 in do_command (thd=0x7fffb0000db8, blocking=true) at ../sql/sql_parse.cc:1407
#12 0x000055555612ef32 in do_handle_one_connection (connect=0x555558b4c218, put_in_cache=true) at ../sql/sql_connect.cc:1416
#13 0x000055555612ec93 in handle_one_connection (arg=0x555558be79f8) at ../sql/sql_connect.cc:1318
#14 0x0000555556681cef in pfs_spawn_thread (arg=0x555558b4bd88) at ../storage/perfschema/pfs.cc:2201
#15 0x00007ffff7f3e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#16 0x00007ffff7b0f163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

So update creates one mini-transaction(prebuilt), check_duplicate_long_entry_key() works on different prebuilt. This leads to block lock assertion.

Comment by Marko Mäkelä [ 2023-05-24 ]

I conducted a small performance test, with Sysbench oltp_read_only and oltp_update_non_index, 4×10000 rows, 96 threads, 30 seconds on /dev/shm, on a dual socket system with 40 threads (2×10 without hyperthreading).

revision oltp_read_only/qps oltp_update_non_index/tps
10.11 (baseline) 324540.53 212804.57
Thiru’s prototype 324430.26 211264.99
removing row_prebuilt_t::fetch_cache 297972.85 211938.88

To my surprise, removing the row_prebuilt_t::fetch_cache, which was one of my motivations for this task, is reducing performance. We will need to find out why.

Before running any further UPDATE or DELETE benchmarks, I think that thiru needs to revise some conditions around the sel_set_rec_lock() calls in row_search_mvcc(), so that DELETE and UPDATE will rely more on implicit locking, like INSERT does. For secondary indexes, some gap locks will be necessary due to reasons explained in MDEV-16406.

Generated at Thu Feb 08 08:27:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.