[MDEV-15417] Two max values of a same column are not equal in one select statement Created: 2018-02-25  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2.12, 10.2, 10.3
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: percy Assignee: Oleksandr Byelkin
Resolution: Unresolved Votes: 0
Labels: innodb
Environment:

OS: CentOS Linux release 7.4.1708 (Core)
Kernel: 3.10.0-693.11.1.el7.x86_64
DB: MariaDB-server-10.2.12-1.el7.centos.x86_64



 Description   

Create table and Insert initial data.

create table customers (
    company_id bigint not null,
    customer_id bigint not null,
    primary key (company_id, customer_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
insert into customers (company_id, customer_id) values
(1, 1), (1, 2), (1, 3);

Set transaction isolation level 'READ COMMITTED' because Deadlock occurs in 'REPEATABLE READ' mode.

tx1) set transaction isolation level read committed;
tx2) set transaction isolation level read committed;
tx1) start transaction;
tx2) start transaction;
tx1) select max(customer_id), max(customer_id) from customers where company_id = 1 for update;
+------------------+------------------+
| max(customer_id) | max(customer_id) |
+------------------+------------------+
|                3 |                3 |
+------------------+------------------+
tx2) select max(customer_id), max(customer_id) from customers where company_id = 1 for update;

Transaction 2 is blocked.

tx1) insert into customers (company_id, customer_id) values (1, 4);
tx1) commit;
 
tx2)
+------------------+------------------+
| max(customer_id) | max(customer_id) |
+------------------+------------------+
|                3 |                4 |
+------------------+------------------+

After commit Transaction 1, Select query in Transaction 2 is executed. But two max values are not equal.

Expected result is:

+------------------+------------------+
| max(customer_id) | max(customer_id) |
+------------------+------------------+
|                4 |                4 |
+------------------+------------------+



 Comments   
Comment by Marko Mäkelä [ 2018-03-07 ]

I did not test this yet, but I have a wild guess: InnoDB is incorrectly starting a new read view for the second MAX evaluation, even though READ COMMITTED should only be created once at the start of the transaction. Maybe the code in ha_innobase::external_lock() and ha_innobase::store_lock() should be refactored, and there should be some per-transaction (handlerton) function to signal the start and end of statements.

In the REPEATABLE READ isolation level, the same read view should continue to exist until the end of the transaction.

Another possibility is that the code for ‘semi-consistent read’ (unlocking non-matching rows) could somehow interfere. I think that this is unlikely, because there is no WHERE condition.

(Another mismatch in the InnoDB interface, in the other direction, is that the trx_t::duplicates flag should be in ha_innobase::prebuilt, that is, specific to each table handle. This could affect REPLACE or INSERT…ON DUPLICATE KEY UPDATE if these touch multiple tables, for example via FOREIGN KEY…CASCADE constraints.)

Comment by Elena Stepanova [ 2018-03-07 ]

Thanks for the report. Reproducible as described on all of MariaDB 5.5-10.3 and MySQL 5.5-8.0. Based on the above comment, setting to 'Confirmed'.

MTR test case, which implements the described scenario verbatim. Please note that it uses a sleep, otherwise it's a subject to race condition (technically it still is, but for the purpose of reproducing sleep 1 should be enough). Don't put it in the regression suite with the sleep, replace with something deterministic.

--source include/have_innodb.inc
 
create table customers (
    company_id bigint not null,
    customer_id bigint not null,
    primary key (company_id, customer_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
insert into customers (company_id, customer_id) values
(1, 1), (1, 2), (1, 3);
 
set transaction isolation level read committed;
 
--connect (con1,localhost,root,,)
 
set transaction isolation level read committed;
 
--connection default
 
start transaction;
 
--connection con1
 
start transaction;
 
--connection default
 
select max(customer_id), max(customer_id) from customers where company_id = 1 for update;
 
--connection con1
--send
  select max(customer_id), max(customer_id) from customers where company_id = 1 for update;
--sleep 1
 
--connection default
insert into customers (company_id, customer_id) values (1, 4);
commit;
 
--connection con1
--reap
 
# Cleanup
--disconnect con1
--connection default
drop table customers;

Comment by Marko Mäkelä [ 2019-04-30 ]

In InnoDB, a locking read is essentially READ COMMITTED. If the lock acquisition succeeds, then the newest value will be returned.

In the test case, the INSERT will arrive between the two reads of MAX(customer_id).

For the first SELECT, we do acquire a record lock during the first call. I verified it:

(gdb) p *prebuilt->trx->lock.trx_locks.end
$16 = {trx = 0x7ffff20d8148, trx_locks = {prev = 0x7ffff20d8cf0, next = 0x0}, 
  index = 0x7fff8401e728, hash = 0x0, requested_time = 0, wait_time = 0, 
  un_member = {tab_lock = {table = 0x300000004, locks = {prev = 0x48, 
        next = 0x0}}, rec_lock = {space = 4, page_no = 3, n_bits = 72}}, 
  type_mode = 1059}
(gdb) p *(ulint*)(&prebuilt->trx->lock.rec_pool[0].lock+1)
$26 = 16

This corresponds to heap_no=4, which is the third inserted user record, that is, (1,3) in our case.

The bug here appears to be that the second SELECT is being invoked on a different prebuilt->trx object, and hence the locks that were acquired by the first SELECT no longer matter. Here is the stack trace for the offending trx switch:

#0  row_update_prebuilt_trx (prebuilt=0x7fff8c02b008, trx=0x7ffff20d9278)
    at /mariadb/10.2/storage/innobase/row/row0mysql.cc:1060
#1  0x0000555555dfa486 in ha_innobase::update_thd (this=0x7fff8c01de00, 
    thd=0x7fff80000d28)
    at /mariadb/10.2/storage/innobase/handler/ha_innodb.cc:3045
#2  0x0000555555dec403 in ha_innobase::external_lock (this=0x7fff8c01de00, 
    thd=0x7fff80000d28, lock_type=1)
    at /mariadb/10.2/storage/innobase/handler/ha_innodb.cc:15770
#3  0x0000555555c2ac9f in handler::ha_external_lock (this=0x7fff8c01de00, 
    thd=0x7fff80000d28, lock_type=1) at /mariadb/10.2/sql/handler.cc:5981
#4  0x0000555555d1636a in lock_external (thd=<optimized out>, 
    tables=<optimized out>, count=1) at /mariadb/10.2/sql/lock.cc:395
#5  mysql_lock_tables (thd=0x7fff80000d28, sql_lock=0x7fff80011860, 
    flags=<optimized out>) at /mariadb/10.2/sql/lock.cc:340
#6  0x0000555555d15bb1 in mysql_lock_tables (thd=<optimized out>, 
    tables=0x7fff80011858, count=1, flags=0) at /mariadb/10.2/sql/lock.cc:303
#7  0x00005555559f65a6 in lock_tables (thd=0x7fff80000d28, 
    tables=0x7fff80010e90, count=<optimized out>, flags=0)
    at /mariadb/10.2/sql/sql_base.cc:5010
#8  0x00005555559f5fa3 in open_and_lock_tables (thd=0x7fff80000d28, 
    options=..., tables=0x7fff80010e90, derived=true, flags=0, 
    prelocking_strategy=0x7ffff14bbe70) at /mariadb/10.2/sql/sql_base.cc:4800
#9  0x0000555555a50fe4 in open_and_lock_tables (thd=<optimized out>, 
    tables=<optimized out>, derived=255, flags=0)
    at /mariadb/10.2/sql/sql_base.h:504

Then, between the 3rd and 4th invocation of row_search_mvcc(), the transaction of the first SELECT is committed:

(gdb) bt
#0  lock_trx_release_locks (trx=<optimized out>)
    at /mariadb/10.2/storage/innobase/lock/lock0lock.cc:6675
#1  0x0000555555fa7872 in trx_commit_in_memory (trx=<optimized out>, 
    mtr=<optimized out>, serialised=<optimized out>)
    at /mariadb/10.2/storage/innobase/trx/trx0trx.cc:1667
#2  trx_commit_low (trx=<optimized out>, mtr=<optimized out>)
    at /mariadb/10.2/storage/innobase/trx/trx0trx.cc:1884
#3  0x0000555555fa9007 in trx_commit (trx=0x7ffff20d8148)
    at /mariadb/10.2/storage/innobase/trx/trx0trx.cc:1908
#4  0x0000555555faa664 in trx_commit_for_mysql (trx=0x7ffff20d8148)
    at /mariadb/10.2/storage/innobase/trx/trx0trx.cc:2117
#5  0x0000555555dd986c in innobase_commit_low (trx=0x7ffff20d8148)
    at /mariadb/10.2/storage/innobase/handler/ha_innodb.cc:4530
#6  0x0000555555df3856 in innobase_commit_ordered_2 (trx=0x7ffff20d8148, 
    thd=<optimized out>)
    at /mariadb/10.2/storage/innobase/handler/ha_innodb.cc:4647
#7  0x0000555555deccb6 in innobase_commit (hton=<optimized out>, 
    thd=0x7fff8c000cf8, commit_trx=true)
    at /mariadb/10.2/storage/innobase/handler/ha_innodb.cc:4763
#8  0x0000555555c209b5 in commit_one_phase_2 (thd=0x7fff8c000cf8, all=true, 
    trans=0x7fff8c003fc0, is_real_trans=<optimized out>)
    at /mariadb/10.2/sql/handler.cc:1570
#9  0x0000555555c208e0 in ha_commit_one_phase (thd=0x7fff8c000cf8, all=true)
    at /mariadb/10.2/sql/handler.cc:1551
#10 0x0000555555c204d3 in ha_commit_trans (thd=0x7fff8c000cf8, all=true)
    at /mariadb/10.2/sql/handler.cc:1418
#11 0x0000555555b4d27b in trans_commit (thd=0x7fff8c000cf8) at /mariadb/10.2/sql/transaction.cc:307
#12 0x0000555555a4b456 in mysql_execute_command (thd=0x7fff8c000cf8) at /mariadb/10.2/sql/sql_parse.cc:5562
#13 0x0000555555a44fd3 in mysql_parse (thd=0x7fff8c000cf8, rawbuf=0x7fff8c011960 "commit", length=<optimized out>, parser_state=0x7ffff1508310, is_com_multi=<optimized out>, is_next_command=false) at /mariadb/10.2/sql/sql_parse.cc:8013
#14 0x0000555555a41d96 in dispatch_command (command=<optimized out>, thd=0x7fff8c000cf8, packet=0x7fff8c008489 "commit", packet_length=2348882272, is_com_multi=false, is_next_command=false) at /mariadb/10.2/sql/sql_parse.cc:1832
#15 0x0000555555a43a08 in do_command (thd=0x7fff8c000cf8) at /mariadb/10.2/sql/sql_parse.cc:1386
#16 0x0000555555b3d730 in do_handle_one_connection (connect=<optimized out>) at /mariadb/10.2/sql/sql_connect.cc:1335
#17 0x0000555555b3d443 in handle_one_connection (arg=0x5555579f6918) at /mariadb/10.2/sql/sql_connect.cc:1241
#18 0x000055555622f4e1 in pfs_spawn_thread (arg=0x5555579a6028) at /mariadb/10.2/storage/perfschema/pfs.cc:1862
#19 0x00007ffff7f8bfa3 in ?? ()
#20 0x0000000000000000 in ?? ()

This allows the INSERT to run between the two MAX of the second SELECT.

Both the transaction switch and the commit are wrongly triggered from the SQL layer. InnoDB is only the victim, not the culprit.

Generated at Thu Feb 08 08:21:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.