Details

    Description

      Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW-locks across handler API calls. (Until that version, btr_search_latch for the adaptive hash index could be held, and there was a special call handlerton::release_temporary_latches.)

      During UPDATE operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini-transaction and to protect the current page with the page latch (buf_block_t::lock) across calls:

      1. Introduce row_prebuilt_t::mtr and keep it open.
      2. Avoid mtr_t::commit() between row reads
      3. Avoid storing & restoring btr_pcur_t position
      4. If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then btr_pcur_store_position(); mtr.commit() will have to be called before the wait and mtr.start();btr_pcur_restore_position(); after it.

      This change could remove any benefit of the row_prebuilt_t::fetch_cache (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables.

      Mini-transactions for single-row UPDATE/DELETE

      1. Search and S-latch the PRIMARY KEY leaf page (get explicit transactional lock)
      2. X-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction
      3. Write undo log record
      4. Modify the PRIMARY KEY index
      5. (For each off-page column, use 1 mini-transaction per page written.)
      6. (For each secondary index, modify the index.)
      7. Commit the user transaction

      There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini-transactions, because the first two writes were merged.)

      We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini-transactions will be needed:

      1. Search and X-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with implicit transactional locking)
      2. (For each off-page column, use 1 mini-transaction per page written.)
      3. (For each secondary index, modify the index.)
      4. Commit the user transaction

      If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single-row user transaction.)

      The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for UPDATE and DELETE. When there is no locking conflict, this should greatly reduce the contention on lock_sys.mutex.

      Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in log_sys.mutex or whatever MDEV-14425 will be replacing it with.

      Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini-transaction.

      Attachments

        Issue Links

          Activity

            Notes from discussion with marko at the optimizer call:

            • It seems to feasible to add something like handler->start_scan() / handler->interrupt_scan() function
            • However, they will be called in a tight loop in the nested-loops join, adding overhead.
            • When exactly should the scan be "interrupted"? If we send a row to the client, does that constitute scan interruption? (sending a row to the client may be as easy as copying it to another buffer, or a syscall + waiting for the network buffer to flush?)
            • If the query runs a join between innodb tables (the most common scenario), perhaps we don't need any extra calls. doing a row read on table X should automatically terminate a mini-transaction on table Y.
            • We can easily throw together a test version which will not handle special cases (slow network clients, subqueries in the WHERE condition doing expensive computations, etc etc). It will allow to check the speedup in the best case.
            psergei Sergei Petrunia added a comment - Notes from discussion with marko at the optimizer call: It seems to feasible to add something like handler->start_scan() / handler->interrupt_scan() function However, they will be called in a tight loop in the nested-loops join, adding overhead. When exactly should the scan be "interrupted"? If we send a row to the client, does that constitute scan interruption? (sending a row to the client may be as easy as copying it to another buffer, or a syscall + waiting for the network buffer to flush?) If the query runs a join between innodb tables (the most common scenario), perhaps we don't need any extra calls. doing a row read on table X should automatically terminate a mini-transaction on table Y. We can easily throw together a test version which will not handle special cases (slow network clients, subqueries in the WHERE condition doing expensive computations, etc etc). It will allow to check the speedup in the best case.

            attached psergey-mdev16232-poc-r1.diff , which is the SQL-level part of the proof-of-concept prototype.

            The patch adds functions into handler:

            +  virtual void start_operations_batch();
            +  virtual void end_operations_batch();
            


            InnoDB doesn't have to release mini-transaction between these calls (as there won't be any other operations in this time). The patch is not complete at all, but it is expected to handle sysbench workloads

            psergei Sergei Petrunia added a comment - attached psergey-mdev16232-poc-r1.diff , which is the SQL-level part of the proof-of-concept prototype. The patch adds functions into handler: + virtual void start_operations_batch(); + virtual void end_operations_batch();  InnoDB doesn't have to release mini-transaction between these calls (as there won't be any other operations in this time). The patch is not complete at all, but it is expected to handle sysbench workloads

            I think that we should start work on this fairly soon, because it could significantly improve DELETE and UPDATE performance and also reduce the memory footprint of partitioned tables.

            marko Marko Mäkelä added a comment - I think that we should start work on this fairly soon, because it could significantly improve DELETE and UPDATE performance and also reduce the memory footprint of partitioned tables.

            Patch is in 10.6-MDEV-16232. Apply the following patch:

            diff --git a/sql/sql_select.cc b/sql/sql_select.cc
            index 44cc1ce7d35..6904a081bda 100644
            --- a/sql/sql_select.cc
            +++ b/sql/sql_select.cc
            @@ -21183,14 +21183,12 @@ sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
               if (pfs_batch_update)
                 join_tab->table->file->start_psi_batch_mode();
             
            -#if 0
               bool ops_batch_started= false;
               if (join_tab->next_select == end_send)
               {
                 join_tab->table->file->start_operations_batch();
                 ops_batch_started= true;
               }
            -#endif
             
               if (rc != NESTED_LOOP_NO_MORE_ROWS)
               {
            @@ -21239,10 +21237,8 @@ sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records)
                 rc= evaluate_join_record(join, join_tab, error);
               }
             
            -#if 0
               if (ops_batch_started)
                 join_tab->table->file->end_operations_batch();
            -#endif
             
               if (rc == NESTED_LOOP_NO_MORE_ROWS &&
                   join_tab->last_inner && !join_tab->found)
            

            Test case is:

            -- source include/have_innodb.inc
             
            create table t1(a int)engine=innodb;
            insert into t1 values (1),(2);
            select * from t1 where a=(select a from t1 where a > 1);
            drop table t1;
            

            Above select does call start_operations_batch() twice and both for the same table.
            Both tries to s-lock the clustered index root page of table t1.
            start_operations_batch has been called for same select twice.

            Thread 20 "mariadbd" hit Breakpoint 2, ha_innobase::start_operations_batch (this=0x7fff94223f90) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
            20648     ut_ad(!m_prebuilt->batch_mtr);
            (gdb) where
            #0  ha_innobase::start_operations_batch (this=0x7fff94223f90) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
            #1  0x0000555555fd2d81 in sub_select (join=0x7fff94018870, join_tab=0x7fff94059260, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21189
            #2  0x0000555555fd2266 in do_select (join=0x7fff94018870, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736
            #3  0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786
            #4  0x0000555555fa4afb in JOIN::exec (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564
            #5  0x0000555555fa6441 in mysql_select (thd=0x7fff94000d78, tables=0x7fff94015ab0, fields=@0x7fff94015778: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fff94015a68, last = 0x7fff94015a68, elements = 1}, <No data fields>}, conds=0x7fff94017d48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff94018848, unit=0x7fff94005198, select_lex=0x7fff940154d8) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:5044
            #6  0x0000555555f956e8 in handle_select (thd=0x7fff94000d78, lex=0x7fff940050c0, result=0x7fff94018848, setup_tables_done_option=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:578
            #7  0x0000555555f3e1b4 in execute_sqlcom_select (thd=0x7fff94000d78, all_tables=0x7fff94015ab0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:6260
            #8  0x0000555555f35383 in mysql_execute_command (thd=0x7fff94000d78, is_called_from_prepared_stmt=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:3944
            #9  0x0000555555f42f77 in mysql_parse (thd=0x7fff94000d78, rawbuf=0x7fff94015410 "select * from t1 where a=(select a from t1 where a > 1)", length=55, parser_state=0x7fffe8259500) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:8036
            #10 0x0000555555f2f487 in dispatch_command (command=COM_QUERY, thd=0x7fff94000d78, packet=0x7fff9400b999 "", packet_length=55, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1894
            #11 0x0000555555f2dd95 in do_command (thd=0x7fff94000d78, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1407
            #12 0x00005555561032b9 in do_handle_one_connection (connect=0x555558cdea68, put_in_cache=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1418
            #13 0x0000555556102f54 in handle_one_connection (arg=0x555558dc2d38) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1312
            #14 0x00005555565f82e7 in pfs_spawn_thread (arg=0x555558cde5d8) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/perfschema/pfs.cc:2201
            #15 0x00007ffff70b36db in start_thread (arg=0x7fffe825a700) at pthread_create.c:463
            #16 0x00007ffff621561f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
             
             
            Thread 20 "mariadbd" hit Breakpoint 2, ha_innobase::start_operations_batch (this=0x7fff94053b60) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
            20648     ut_ad(!m_prebuilt->batch_mtr);
            (gdb) where
            #0  ha_innobase::start_operations_batch (this=0x7fff94053b60) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648
            #1  0x0000555555fd2d81 in sub_select (join=0x7fff940551d0, join_tab=0x7fff94056990, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21189
            #2  0x0000555555fd2266 in do_select (join=0x7fff940551d0, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736
            #3  0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff940551d0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786
            #4  0x0000555555fa4afb in JOIN::exec (this=0x7fff940551d0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564
            #5  0x0000555556400a82 in subselect_single_select_engine::exec (this=0x7fff94017d08) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:4144
            #6  0x00005555563f3895 in Item_subselect::exec (this=0x7fff94017b80) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:854
            #7  0x00005555563f592f in Item_singlerow_subselect::val_int (this=0x7fff94017b80) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:1504
            #8  0x000055555633c549 in Arg_comparator::compare_int_signed (this=0x7fff94059a70) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.cc:933
            #9  0x0000555556353b9e in Arg_comparator::compare (this=0x7fff94059a70) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.h:103
            #10 0x000055555633ee72 in Item_func_eq::val_int (this=0x7fff940599c0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.cc:1762
            #11 0x0000555555fd3248 in evaluate_join_record (join=0x7fff94018870, join_tab=0x7fff94059260, error=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21299
            #12 0x0000555555fd2e01 in sub_select (join=0x7fff94018870, join_tab=0x7fff94059260, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21198
            #13 0x0000555555fd2266 in do_select (join=0x7fff94018870, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736
            #14 0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786
            #15 0x0000555555fa4afb in JOIN::exec (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564
            #16 0x0000555555fa6441 in mysql_select (thd=0x7fff94000d78, tables=0x7fff94015ab0, fields=@0x7fff94015778: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fff94015a68, last = 0x7fff94015a68, elements = 1}, <No data fields>}, conds=0x7fff94017d48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff94018848, unit=0x7fff94005198, select_lex=0x7fff940154d8) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:5044
            #17 0x0000555555f956e8 in handle_select (thd=0x7fff94000d78, lex=0x7fff940050c0, result=0x7fff94018848, setup_tables_done_option=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:578
            #18 0x0000555555f3e1b4 in execute_sqlcom_select (thd=0x7fff94000d78, all_tables=0x7fff94015ab0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:6260
            #19 0x0000555555f35383 in mysql_execute_command (thd=0x7fff94000d78, is_called_from_prepared_stmt=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:3944
            #20 0x0000555555f42f77 in mysql_parse (thd=0x7fff94000d78, rawbuf=0x7fff94015410 "select * from t1 where a=(select a from t1 where a > 1)", length=55, parser_state=0x7fffe8259500) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:8036
            #21 0x0000555555f2f487 in dispatch_command (command=COM_QUERY, thd=0x7fff94000d78, packet=0x7fff9400b999 "", packet_length=55, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1894
            #22 0x0000555555f2dd95 in do_command (thd=0x7fff94000d78, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1407
            #23 0x00005555561032b9 in do_handle_one_connection (connect=0x555558cdea68, put_in_cache=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1418
            #24 0x0000555556102f54 in handle_one_connection (arg=0x555558dc2d38) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1312
            #25 0x00005555565f82e7 in pfs_spawn_thread (arg=0x555558cde5d8) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/perfschema/pfs.cc:2201
            #26 0x00007ffff70b36db in start_thread (arg=0x7fffe825a700) at pthread_create.c:463
            #27 0x00007ffff621561f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            
            

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in 10.6- MDEV-16232 . Apply the following patch: diff --git a/sql/sql_select.cc b/sql/sql_select.cc index 44cc1ce7d35..6904a081bda 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -21183,14 +21183,12 @@ sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records) if (pfs_batch_update) join_tab->table->file->start_psi_batch_mode(); -#if 0 bool ops_batch_started= false; if (join_tab->next_select == end_send) { join_tab->table->file->start_operations_batch(); ops_batch_started= true; } -#endif if (rc != NESTED_LOOP_NO_MORE_ROWS) { @@ -21239,10 +21237,8 @@ sub_select(JOIN *join,JOIN_TAB *join_tab,bool end_of_records) rc= evaluate_join_record(join, join_tab, error); } -#if 0 if (ops_batch_started) join_tab->table->file->end_operations_batch(); -#endif if (rc == NESTED_LOOP_NO_MORE_ROWS && join_tab->last_inner && !join_tab->found) Test case is: -- source include/have_innodb.inc   create table t1(a int)engine=innodb; insert into t1 values (1),(2); select * from t1 where a=(select a from t1 where a > 1); drop table t1; Above select does call start_operations_batch() twice and both for the same table. Both tries to s-lock the clustered index root page of table t1. start_operations_batch has been called for same select twice. Thread 20 "mariadbd" hit Breakpoint 2, ha_innobase::start_operations_batch (this=0x7fff94223f90) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648 20648 ut_ad(!m_prebuilt->batch_mtr); (gdb) where #0 ha_innobase::start_operations_batch (this=0x7fff94223f90) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648 #1 0x0000555555fd2d81 in sub_select (join=0x7fff94018870, join_tab=0x7fff94059260, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21189 #2 0x0000555555fd2266 in do_select (join=0x7fff94018870, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736 #3 0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786 #4 0x0000555555fa4afb in JOIN::exec (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564 #5 0x0000555555fa6441 in mysql_select (thd=0x7fff94000d78, tables=0x7fff94015ab0, fields=@0x7fff94015778: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fff94015a68, last = 0x7fff94015a68, elements = 1}, <No data fields>}, conds=0x7fff94017d48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff94018848, unit=0x7fff94005198, select_lex=0x7fff940154d8) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:5044 #6 0x0000555555f956e8 in handle_select (thd=0x7fff94000d78, lex=0x7fff940050c0, result=0x7fff94018848, setup_tables_done_option=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:578 #7 0x0000555555f3e1b4 in execute_sqlcom_select (thd=0x7fff94000d78, all_tables=0x7fff94015ab0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:6260 #8 0x0000555555f35383 in mysql_execute_command (thd=0x7fff94000d78, is_called_from_prepared_stmt=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:3944 #9 0x0000555555f42f77 in mysql_parse (thd=0x7fff94000d78, rawbuf=0x7fff94015410 "select * from t1 where a=(select a from t1 where a > 1)", length=55, parser_state=0x7fffe8259500) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:8036 #10 0x0000555555f2f487 in dispatch_command (command=COM_QUERY, thd=0x7fff94000d78, packet=0x7fff9400b999 "", packet_length=55, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1894 #11 0x0000555555f2dd95 in do_command (thd=0x7fff94000d78, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1407 #12 0x00005555561032b9 in do_handle_one_connection (connect=0x555558cdea68, put_in_cache=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1418 #13 0x0000555556102f54 in handle_one_connection (arg=0x555558dc2d38) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1312 #14 0x00005555565f82e7 in pfs_spawn_thread (arg=0x555558cde5d8) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/perfschema/pfs.cc:2201 #15 0x00007ffff70b36db in start_thread (arg=0x7fffe825a700) at pthread_create.c:463 #16 0x00007ffff621561f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95     Thread 20 "mariadbd" hit Breakpoint 2, ha_innobase::start_operations_batch (this=0x7fff94053b60) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648 20648 ut_ad(!m_prebuilt->batch_mtr); (gdb) where #0 ha_innobase::start_operations_batch (this=0x7fff94053b60) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/innobase/handler/ha_innodb.cc:20648 #1 0x0000555555fd2d81 in sub_select (join=0x7fff940551d0, join_tab=0x7fff94056990, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21189 #2 0x0000555555fd2266 in do_select (join=0x7fff940551d0, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736 #3 0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff940551d0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786 #4 0x0000555555fa4afb in JOIN::exec (this=0x7fff940551d0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564 #5 0x0000555556400a82 in subselect_single_select_engine::exec (this=0x7fff94017d08) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:4144 #6 0x00005555563f3895 in Item_subselect::exec (this=0x7fff94017b80) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:854 #7 0x00005555563f592f in Item_singlerow_subselect::val_int (this=0x7fff94017b80) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_subselect.cc:1504 #8 0x000055555633c549 in Arg_comparator::compare_int_signed (this=0x7fff94059a70) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.cc:933 #9 0x0000555556353b9e in Arg_comparator::compare (this=0x7fff94059a70) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.h:103 #10 0x000055555633ee72 in Item_func_eq::val_int (this=0x7fff940599c0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/item_cmpfunc.cc:1762 #11 0x0000555555fd3248 in evaluate_join_record (join=0x7fff94018870, join_tab=0x7fff94059260, error=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21299 #12 0x0000555555fd2e01 in sub_select (join=0x7fff94018870, join_tab=0x7fff94059260, end_of_records=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:21198 #13 0x0000555555fd2266 in do_select (join=0x7fff94018870, procedure=0x0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:20736 #14 0x0000555555fa5a7f in JOIN::exec_inner (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4786 #15 0x0000555555fa4afb in JOIN::exec (this=0x7fff94018870) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:4564 #16 0x0000555555fa6441 in mysql_select (thd=0x7fff94000d78, tables=0x7fff94015ab0, fields=@0x7fff94015778: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fff94015a68, last = 0x7fff94015a68, elements = 1}, <No data fields>}, conds=0x7fff94017d48, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fff94018848, unit=0x7fff94005198, select_lex=0x7fff940154d8) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:5044 #17 0x0000555555f956e8 in handle_select (thd=0x7fff94000d78, lex=0x7fff940050c0, result=0x7fff94018848, setup_tables_done_option=0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_select.cc:578 #18 0x0000555555f3e1b4 in execute_sqlcom_select (thd=0x7fff94000d78, all_tables=0x7fff94015ab0) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:6260 #19 0x0000555555f35383 in mysql_execute_command (thd=0x7fff94000d78, is_called_from_prepared_stmt=false) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:3944 #20 0x0000555555f42f77 in mysql_parse (thd=0x7fff94000d78, rawbuf=0x7fff94015410 "select * from t1 where a=(select a from t1 where a > 1)", length=55, parser_state=0x7fffe8259500) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:8036 #21 0x0000555555f2f487 in dispatch_command (command=COM_QUERY, thd=0x7fff94000d78, packet=0x7fff9400b999 "", packet_length=55, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1894 #22 0x0000555555f2dd95 in do_command (thd=0x7fff94000d78, blocking=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_parse.cc:1407 #23 0x00005555561032b9 in do_handle_one_connection (connect=0x555558cdea68, put_in_cache=true) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1418 #24 0x0000555556102f54 in handle_one_connection (arg=0x555558dc2d38) at /home/thiru/mariarepo/server/10.9/10.9-test/sql/sql_connect.cc:1312 #25 0x00005555565f82e7 in pfs_spawn_thread (arg=0x555558cde5d8) at /home/thiru/mariarepo/server/10.9/10.9-test/storage/perfschema/pfs.cc:2201 #26 0x00007ffff70b36db in start_thread (arg=0x7fffe825a700) at pthread_create.c:463 #27 0x00007ffff621561f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

            Pushed this:

            This is a prototype-quality patch for the SQL layer which should unblock the InnoDB team.

            commit 92665b862b479f6582e3a15247bfc812e59e7e34 (HEAD -> 10.9-MDEV-16232, origin/10.9-MDEV-16232)
            Author: Sergei Petrunia <sergey@mariadb.com>
            Date:   Wed Jun 8 15:43:06 2022 +0300
             
                MDEV-16232: Use fewer mini-transactions
                
                SQL layer support part #2. The main idea is:
                employ this optimization only when the query is using just one table.
                (the scope is too limited but the goal is to get sysbench to work).
                
                Make UPDATE, DELETE, and SELECT codepaths check that the query has just
                one table before the optimization is employed.
                
                Also: in DELETE code, the start_operations_batch/end_operations_batch
                were in the "Direct DELETE" branch. I think it's a mistake as InnoDB
                doesn't use Direct DELETE optimization.
                Removed the calls from there and added them to the regular DELETE codepath.
            
            

            psergei Sergei Petrunia added a comment - Pushed this: This is a prototype-quality patch for the SQL layer which should unblock the InnoDB team. commit 92665b862b479f6582e3a15247bfc812e59e7e34 (HEAD -> 10.9-MDEV-16232, origin/10.9-MDEV-16232) Author: Sergei Petrunia <sergey@mariadb.com> Date: Wed Jun 8 15:43:06 2022 +0300   MDEV-16232: Use fewer mini-transactions SQL layer support part #2. The main idea is: employ this optimization only when the query is using just one table. (the scope is too limited but the goal is to get sysbench to work). Make UPDATE, DELETE, and SELECT codepaths check that the query has just one table before the optimization is employed. Also: in DELETE code, the start_operations_batch/end_operations_batch were in the "Direct DELETE" branch. I think it's a mistake as InnoDB doesn't use Direct DELETE optimization. Removed the calls from there and added them to the regular DELETE codepath.

            I created a branch bb-10.5-MDEV-16232 to let a user confirm whether their observation of increased memory usage is caused by the row prefetch cache. The cache would be removed as part of this task.

            marko Marko Mäkelä added a comment - I created a branch bb-10.5-MDEV-16232 to let a user confirm whether their observation of increased memory usage is caused by the row prefetch cache. The cache would be removed as part of this task.

            (a note so that this questoin is not forgotten):
            Why does this patch:

            commit eb63f8b9a56fe673e5bf8add20f6b457c7032e69
            Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com>
            Date:   Thu May 26 16:26:23 2022 +0530
             
                MDEV-16232      Use fewer mini-transactions
                
                - For update operation, InnoDB uses separate mtr for the following:
                ...
            

            add implementation of start_operations_batch() and end_operations_batch() for every engine? Won't it be easier if there was a default implementation, handler::start_operations_batch() which did nothing?

            psergei Sergei Petrunia added a comment - (a note so that this questoin is not forgotten): Why does this patch: commit eb63f8b9a56fe673e5bf8add20f6b457c7032e69 Author: Thirunarayanan Balathandayuthapani <thiru@mariadb.com> Date: Thu May 26 16:26:23 2022 +0530   MDEV-16232 Use fewer mini-transactions - For update operation, InnoDB uses separate mtr for the following: ... add implementation of start_operations_batch() and end_operations_batch() for every engine? Won't it be easier if there was a default implementation, handler::start_operations_batch() which did nothing?

            Would terminology like opening or closing a page cursor be acceptable?

            I have the impression that system versioning is mostly performing its special handling of UPDATE and DELETE inside InnoDB. I do not foresee a problem with that.

            For a read that is executed as part of INSERT…SELECT, CREATE…SELECT or even ALTER TABLE…ALGORITHM=COPY (in copy_data_between_tables()), it would be good if the SQL layer fetched multiple rows in a batch, before ending the batch and inserting the rows.

            An infinite loop in INSERT INTO t1 SELECT FROM t1 might be avoided by the fact that it is using a locking read. (Related to that, see also MDEV-24813.) Perhaps locking reads will ignore records that were inserted by the current transaction. This optimization should not affect that.

            marko Marko Mäkelä added a comment - Would terminology like opening or closing a page cursor be acceptable? I have the impression that system versioning is mostly performing its special handling of UPDATE and DELETE inside InnoDB. I do not foresee a problem with that. For a read that is executed as part of INSERT…SELECT , CREATE…SELECT or even ALTER TABLE…ALGORITHM=COPY (in copy_data_between_tables() ), it would be good if the SQL layer fetched multiple rows in a batch, before ending the batch and inserting the rows. An infinite loop in INSERT INTO t1 SELECT FROM t1 might be avoided by the fact that it is using a locking read. (Related to that, see also MDEV-24813 .) Perhaps locking reads will ignore records that were inserted by the current transaction. This optimization should not affect that.

            --source include/have_innodb.inc
            create table t1 (v3 int primary key, v2 text(100) unique not null) engine=innodb;
            insert into t1 values ( -32768 , -128) , (-1 , 44);
            update t1 set v2 = 1;
            drop table t1;
            

            Above test case fails in 10.11-MDEV-16232 branch.

            In update_stmt(), InnoDB do start batch mini-transaction and holds the lock on clustered
            index record and secondary index records. Since this table has long unique blob
            (table->s->long_unique_table), we do clone the current handler and it leads
            to different prebuilt.

            (rr) where
            #0  0x0000563635d95fc5 in ha_innobase::open (this=0x7f66240163a0, name=0x7f66240919a8 "test/t1") at ../storage/innobase/handler/ha_innodb.cc:5959
            #1  0x000056363595c6ca in handler::ha_open (this=0x7f66240163a0, table_arg=0x7f66241e0238, name=0x7f66240919a8 "test/t1", mode=33, test_if_locked=2, mem_root=0x7f6624006d78, partitions_to_open=0x0) at ../sql/handler.cc:3349
            #2  0x000056363595bcb1 in handler::clone (this=0x7f6624238ef0, name=0x7f66240919a8 "test/t1", mem_root=0x7f6624006d78) at ../sql/handler.cc:3209
            #3  0x0000563635d96ce2 in ha_innobase::clone (this=0x7f6624238ef0, name=0x7f66241dfff0 "./test/t1", mem_root=0x7f6624006d78) at ../storage/innobase/handler/ha_innodb.cc:6237
            #4  0x000056363595bd4d in handler::create_lookup_handler (this=0x7f6624238ef0) at ../sql/handler.cc:3232
            #5  0x000056363596af02 in handler::prepare_for_insert (this=0x7f6624238ef0, do_create=true) at ../sql/handler.cc:7543
            #6  0x00005636356dc24a in mysql_update (thd=0x7f6624000db8, table_list=0x7f6624015580, fields=@0x7f6624005cc8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7f6624015e50, last = 0x7f6624015e50, elements = 1}, <No data fields>}, values=@0x7f6624006100: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7f6624015e60, last = 0x7f6624015e60, elements = 1}, <No data fields>}, conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7f6684058de0, updated_return=0x7f6684058ed0) at ../sql/sql_update.cc:1033
            #7  0x00005636355833b1 in mysql_execute_command (thd=0x7f6624000db8, is_called_from_prepared_stmt=false) at ../sql/sql_parse.cc:4410
            #8  0x000056363558f60a in mysql_parse (thd=0x7f6624000db8, rawbuf=0x7f66240154b0 "update t1 set v2 = 1", length=20, parser_state=0x7f6684059390) at ../sql/sql_parse.cc:8000
            #9  0x000056363557ba9d in dispatch_command (command=COM_QUERY, thd=0x7f6624000db8, packet=0x7f662400ba39 "update t1 set v2 = 1", packet_length=20, blocking=true) at ../sql/sql_parse.cc:1894
            #10 0x000056363557a45c in do_command (thd=0x7f6624000db8, blocking=true) at ../sql/sql_parse.cc:1407
            #11 0x000056363575b9b5 in do_handle_one_connection (connect=0x5636390913d8, put_in_cache=true) at ../sql/sql_connect.cc:1416
            #12 0x000056363575b716 in handle_one_connection (arg=0x563639175b18) at ../sql/sql_connect.cc:1318
            #13 0x0000563635cb72b5 in pfs_spawn_thread (arg=0x56363913d9e8) at ../storage/perfschema/pfs.cc:2201
            #14 0x00007f669bceb609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #15 0x00007f669b8bc163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            After creating the new record, we do lookup in unique index with the cloned prebuilt:

            gdb) where
            #0  row_search_mvcc (buf=0x555556c08bee <_my_thread_var+24> "]\303\363\017\036\372UH\211\345H\203\354\020H\211}\370H\215\005i9u\001\213", mode=32767, prebuilt=0x1000056c08d2a, match_mode=140736959523632, direction=0) at ../storage/innobase/row/row0sel.cc:4347
            #1  0x00005555567667a2 in ha_innobase::index_read (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key_ptr=0x7fffe07ab63f "\364\001", key_len=8, find_flag=HA_READ_KEY_EXACT) at ../storage/innobase/handler/ha_innodb.cc:9004
            #2  0x000055555624da34 in handler::index_read_map (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key=0x7fffe07ab63f "\364\001", keypart_map=18446744073709551615, find_flag=HA_READ_KEY_EXACT) at ../sql/handler.h:3861
            #3  0x000055555632b23a in handler::ha_index_read_map (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key=0x7fffe07ab63f "\364\001", keypart_map=18446744073709551615, find_flag=HA_READ_KEY_EXACT) at ../sql/handler.cc:3488
            #4  0x000055555633731d in handler::check_duplicate_long_entry_key (this=0x7fffb023a560, new_rec=0x7fffb02292e8 "", key_no=1) at ../sql/handler.cc:7248
            #5  0x0000555556337c8a in handler::check_duplicate_long_entries_update (this=0x7fffb023a560, new_rec=0x7fffb02292e8 "") at ../sql/handler.cc:7381
            #6  0x0000555556339033 in handler::ha_update_row (this=0x7fffb023a560, old_data=0x7fffb0229300 "", new_data=0x7fffb02292e8 "") at ../sql/handler.cc:7664
            #7  0x00005555560b100a in mysql_update (thd=0x7fffb0000db8, table_list=0x7fffb0015560, fields=@0x7fffb0005ca8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffb0015e20, last = 0x7fffb0015e20, elements = 1}, <No data fields>}, values=@0x7fffb0006100: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffb0015e30, last = 0x7fffb0015e30, elements = 1}, <No data fields>}, conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7fffe07abe30, updated_return=0x7fffe07abf20) at ../sql/sql_update.cc:1092
            #8  0x0000555555f5ae92 in mysql_execute_command (thd=0x7fffb0000db8, is_called_from_prepared_stmt=false) at ../sql/sql_parse.cc:4411
            #9  0x0000555555f67119 in mysql_parse (thd=0x7fffb0000db8, rawbuf=0x7fffb0015490 "update t1 set v2 = 1", length=20, parser_state=0x7fffe07ac3a0) at ../sql/sql_parse.cc:8019
            #10 0x0000555555f5359a in dispatch_command (command=COM_QUERY, thd=0x7fffb0000db8, packet=0x7fffb000b9f9 "update t1 set v2 = 1", packet_length=20, blocking=true) at ../sql/sql_parse.cc:1894
            #11 0x0000555555f51f59 in do_command (thd=0x7fffb0000db8, blocking=true) at ../sql/sql_parse.cc:1407
            #12 0x000055555612ef32 in do_handle_one_connection (connect=0x555558b4c218, put_in_cache=true) at ../sql/sql_connect.cc:1416
            #13 0x000055555612ec93 in handle_one_connection (arg=0x555558be79f8) at ../sql/sql_connect.cc:1318
            #14 0x0000555556681cef in pfs_spawn_thread (arg=0x555558b4bd88) at ../storage/perfschema/pfs.cc:2201
            #15 0x00007ffff7f3e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #16 0x00007ffff7b0f163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            

            So update creates one mini-transaction(prebuilt), check_duplicate_long_entry_key() works on different prebuilt. This leads to block lock assertion.

            thiru Thirunarayanan Balathandayuthapani added a comment - --source include/have_innodb.inc create table t1 (v3 int primary key, v2 text(100) unique not null) engine=innodb; insert into t1 values ( -32768 , -128) , (-1 , 44); update t1 set v2 = 1; drop table t1; Above test case fails in 10.11- MDEV-16232 branch. In update_stmt(), InnoDB do start batch mini-transaction and holds the lock on clustered index record and secondary index records. Since this table has long unique blob (table->s->long_unique_table), we do clone the current handler and it leads to different prebuilt. (rr) where #0 0x0000563635d95fc5 in ha_innobase::open (this=0x7f66240163a0, name=0x7f66240919a8 "test/t1") at ../storage/innobase/handler/ha_innodb.cc:5959 #1 0x000056363595c6ca in handler::ha_open (this=0x7f66240163a0, table_arg=0x7f66241e0238, name=0x7f66240919a8 "test/t1", mode=33, test_if_locked=2, mem_root=0x7f6624006d78, partitions_to_open=0x0) at ../sql/handler.cc:3349 #2 0x000056363595bcb1 in handler::clone (this=0x7f6624238ef0, name=0x7f66240919a8 "test/t1", mem_root=0x7f6624006d78) at ../sql/handler.cc:3209 #3 0x0000563635d96ce2 in ha_innobase::clone (this=0x7f6624238ef0, name=0x7f66241dfff0 "./test/t1", mem_root=0x7f6624006d78) at ../storage/innobase/handler/ha_innodb.cc:6237 #4 0x000056363595bd4d in handler::create_lookup_handler (this=0x7f6624238ef0) at ../sql/handler.cc:3232 #5 0x000056363596af02 in handler::prepare_for_insert (this=0x7f6624238ef0, do_create=true) at ../sql/handler.cc:7543 #6 0x00005636356dc24a in mysql_update (thd=0x7f6624000db8, table_list=0x7f6624015580, fields=@0x7f6624005cc8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7f6624015e50, last = 0x7f6624015e50, elements = 1}, <No data fields>}, values=@0x7f6624006100: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7f6624015e60, last = 0x7f6624015e60, elements = 1}, <No data fields>}, conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7f6684058de0, updated_return=0x7f6684058ed0) at ../sql/sql_update.cc:1033 #7 0x00005636355833b1 in mysql_execute_command (thd=0x7f6624000db8, is_called_from_prepared_stmt=false) at ../sql/sql_parse.cc:4410 #8 0x000056363558f60a in mysql_parse (thd=0x7f6624000db8, rawbuf=0x7f66240154b0 "update t1 set v2 = 1", length=20, parser_state=0x7f6684059390) at ../sql/sql_parse.cc:8000 #9 0x000056363557ba9d in dispatch_command (command=COM_QUERY, thd=0x7f6624000db8, packet=0x7f662400ba39 "update t1 set v2 = 1", packet_length=20, blocking=true) at ../sql/sql_parse.cc:1894 #10 0x000056363557a45c in do_command (thd=0x7f6624000db8, blocking=true) at ../sql/sql_parse.cc:1407 #11 0x000056363575b9b5 in do_handle_one_connection (connect=0x5636390913d8, put_in_cache=true) at ../sql/sql_connect.cc:1416 #12 0x000056363575b716 in handle_one_connection (arg=0x563639175b18) at ../sql/sql_connect.cc:1318 #13 0x0000563635cb72b5 in pfs_spawn_thread (arg=0x56363913d9e8) at ../storage/perfschema/pfs.cc:2201 #14 0x00007f669bceb609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #15 0x00007f669b8bc163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 After creating the new record, we do lookup in unique index with the cloned prebuilt: gdb) where #0 row_search_mvcc (buf=0x555556c08bee <_my_thread_var+24> "]\303\363\017\036\372UH\211\345H\203\354\020H\211}\370H\215\005i9u\001\213", mode=32767, prebuilt=0x1000056c08d2a, match_mode=140736959523632, direction=0) at ../storage/innobase/row/row0sel.cc:4347 #1 0x00005555567667a2 in ha_innobase::index_read (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key_ptr=0x7fffe07ab63f "\364\001", key_len=8, find_flag=HA_READ_KEY_EXACT) at ../storage/innobase/handler/ha_innodb.cc:9004 #2 0x000055555624da34 in handler::index_read_map (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key=0x7fffe07ab63f "\364\001", keypart_map=18446744073709551615, find_flag=HA_READ_KEY_EXACT) at ../sql/handler.h:3861 #3 0x000055555632b23a in handler::ha_index_read_map (this=0x7fffb0016348, buf=0x7fffb02292e8 "", key=0x7fffe07ab63f "\364\001", keypart_map=18446744073709551615, find_flag=HA_READ_KEY_EXACT) at ../sql/handler.cc:3488 #4 0x000055555633731d in handler::check_duplicate_long_entry_key (this=0x7fffb023a560, new_rec=0x7fffb02292e8 "", key_no=1) at ../sql/handler.cc:7248 #5 0x0000555556337c8a in handler::check_duplicate_long_entries_update (this=0x7fffb023a560, new_rec=0x7fffb02292e8 "") at ../sql/handler.cc:7381 #6 0x0000555556339033 in handler::ha_update_row (this=0x7fffb023a560, old_data=0x7fffb0229300 "", new_data=0x7fffb02292e8 "") at ../sql/handler.cc:7664 #7 0x00005555560b100a in mysql_update (thd=0x7fffb0000db8, table_list=0x7fffb0015560, fields=@0x7fffb0005ca8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffb0015e20, last = 0x7fffb0015e20, elements = 1}, <No data fields>}, values=@0x7fffb0006100: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x7fffb0015e30, last = 0x7fffb0015e30, elements = 1}, <No data fields>}, conds=0x0, order_num=0, order=0x0, limit=18446744073709551615, ignore=false, found_return=0x7fffe07abe30, updated_return=0x7fffe07abf20) at ../sql/sql_update.cc:1092 #8 0x0000555555f5ae92 in mysql_execute_command (thd=0x7fffb0000db8, is_called_from_prepared_stmt=false) at ../sql/sql_parse.cc:4411 #9 0x0000555555f67119 in mysql_parse (thd=0x7fffb0000db8, rawbuf=0x7fffb0015490 "update t1 set v2 = 1", length=20, parser_state=0x7fffe07ac3a0) at ../sql/sql_parse.cc:8019 #10 0x0000555555f5359a in dispatch_command (command=COM_QUERY, thd=0x7fffb0000db8, packet=0x7fffb000b9f9 "update t1 set v2 = 1", packet_length=20, blocking=true) at ../sql/sql_parse.cc:1894 #11 0x0000555555f51f59 in do_command (thd=0x7fffb0000db8, blocking=true) at ../sql/sql_parse.cc:1407 #12 0x000055555612ef32 in do_handle_one_connection (connect=0x555558b4c218, put_in_cache=true) at ../sql/sql_connect.cc:1416 #13 0x000055555612ec93 in handle_one_connection (arg=0x555558be79f8) at ../sql/sql_connect.cc:1318 #14 0x0000555556681cef in pfs_spawn_thread (arg=0x555558b4bd88) at ../storage/perfschema/pfs.cc:2201 #15 0x00007ffff7f3e609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #16 0x00007ffff7b0f163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 So update creates one mini-transaction(prebuilt), check_duplicate_long_entry_key() works on different prebuilt. This leads to block lock assertion.
            marko Marko Mäkelä added a comment - - edited

            I conducted a small performance test, with Sysbench oltp_read_only and oltp_update_non_index, 4×10000 rows, 96 threads, 30 seconds on /dev/shm, on a dual socket system with 40 threads (2×10 without hyperthreading).

            revision oltp_read_only/qps oltp_update_non_index/tps
            10.11 (baseline) 324540.53 212804.57
            Thiru’s prototype 324430.26 211264.99
            removing row_prebuilt_t::fetch_cache 297972.85 211938.88

            To my surprise, removing the row_prebuilt_t::fetch_cache, which was one of my motivations for this task, is reducing performance. We will need to find out why.

            Before running any further UPDATE or DELETE benchmarks, I think that thiru needs to revise some conditions around the sel_set_rec_lock() calls in row_search_mvcc(), so that DELETE and UPDATE will rely more on implicit locking, like INSERT does. For secondary indexes, some gap locks will be necessary due to reasons explained in MDEV-16406.

            marko Marko Mäkelä added a comment - - edited I conducted a small performance test, with Sysbench oltp_read_only and oltp_update_non_index , 4×10000 rows, 96 threads, 30 seconds on /dev/shm , on a dual socket system with 40 threads (2×10 without hyperthreading). revision oltp_read_only /qps oltp_update_non_index /tps 10.11 (baseline) 324540.53 212804.57 Thiru’s prototype 324430.26 211264.99 removing row_prebuilt_t::fetch_cache 297972.85 211938.88 To my surprise, removing the row_prebuilt_t::fetch_cache , which was one of my motivations for this task, is reducing performance. We will need to find out why. Before running any further UPDATE or DELETE benchmarks, I think that thiru needs to revise some conditions around the sel_set_rec_lock() calls in row_search_mvcc() , so that DELETE and UPDATE will rely more on implicit locking, like INSERT does. For secondary indexes, some gap locks will be necessary due to reasons explained in MDEV-16406 .

            People

              debarun Debarun Banerjee
              marko Marko Mäkelä
              Votes:
              5 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.