Ubuntu 17.04 (Zesty Zapus) but I assume that the OS is not important.
Description
Sorry, but the frontend did not allow me to fill in the MariaDB versions used.
No replay on 10.1.35 commit 1d10c9afe0f2f4fba73892e6c12ea6efe90d5931.
Replay on
Bug reports with similar content MDEV-15326 But slightly different backtrace.
simplified RQG YY grammar:
query:
\# The ALTER TABLE t1 is required for replay, but what is that doing at all?
ALTER TABLE t1 |
DELETE FROM t1 WHERE col1 = 1 OR col1 IS NULL ; INSERT INTO t1 (col1) VALUES (1) ;
query_init:
start_delay ; CREATE TABLE IF NOT EXISTS t1 (col1 INT) ENGINE = InnoDB ; SET AUTOCOMMIT = 0;
start_delay:
{ sleep 5; return undef };
The bug seems to be a concurrency problem because attempts with on session only did not replay.
With 8 concurrent sessions I need usually one attempt lasting ~ 60s.
Matthias Leich
added a comment - - edited Bug reports with similar content MDEV-15326 But slightly different backtrace.
simplified RQG YY grammar:
query:
\# The ALTER TABLE t1 is required for replay, but what is that doing at all?
ALTER TABLE t1 |
DELETE FROM t1 WHERE col1 = 1 OR col1 IS NULL ; INSERT INTO t1 (col1) VALUES (1) ;
query_init:
start_delay ; CREATE TABLE IF NOT EXISTS t1 (col1 INT) ENGINE = InnoDB ; SET AUTOCOMMIT = 0;
start_delay:
{ sleep 5; return undef };
The bug seems to be a concurrency problem because attempts with on session only did not replay.
With 8 concurrent sessions I need usually one attempt lasting ~ 60s.
Marko gave the hint to try the test with "innodb_lock_schedule_algorithm=fcfs".
None of the ~170 RQG runs with the grammar above replayed the problem. All runs passed.
Matthias Leich
added a comment - - edited Marko gave the hint to try the test with "innodb_lock_schedule_algorithm=fcfs".
None of the ~170 RQG runs with the grammar above replayed the problem. All runs passed.
MDEV-11039merged a contributed change that changed the InnoDB lock conflict resolution and introduced a parameter innodb_lock_schedule_algorithm=vats. The old behaviour is available by setting innodb_lock_schedule_algorithm=fcfs.
If this problem is not repeatable with innodb_lock_schedule_algorithm=fcfs, it seems to be a regression caused by MDEV-11039.
Marko Mäkelä
added a comment - MDEV-11039 merged a contributed change that changed the InnoDB lock conflict resolution and introduced a parameter innodb_lock_schedule_algorithm=vats . The old behaviour is available by setting innodb_lock_schedule_algorithm=fcfs .
If this problem is not repeatable with innodb_lock_schedule_algorithm=fcfs , it seems to be a regression caused by MDEV-11039 .
I posted feedback about this to the MDEV-11039 pull request #248, including the following:
In MySQL 8.0, I found the following bug fixes related to this: mysql/mysql-server@f395242 (a test case refers to MySQL Bug #89829, which I cannot access). mysql/mysql-server@2248a3e, mysql/mysql-server@c917825 (these could be addressing a similar issue as MDEV-16664) mysql/mysql-server@c1990af (memory management issue, apparently specific to the MySQL 8.0 implementation; in MariaDB lock_grant_and_move_on_page() which corresponds to the MySQL 8.0 lock_grant_vats() does not allocate memory)
It looks like MySQL 8.0 employs a threshold based on lock_sys->n_waiting < LOCK_VATS_THRESHOLD (defined as 32), so it would be dynamically choosing between the algorithms FCFS and VATS (also called CATS by them) based on load. This could make it harder to repeat the MDEV-16664 failure scenario in MySQL 8.0.
Marko Mäkelä
added a comment - I posted feedback about this to the MDEV-11039 pull request #248 , including the following:
In MySQL 8.0, I found the following bug fixes related to this:
mysql/mysql-server@f395242 (a test case refers to MySQL Bug #89829, which I cannot access).
mysql/mysql-server@2248a3e , mysql/mysql-server@c917825 (these could be addressing a similar issue as MDEV-16664 )
mysql/mysql-server@c1990af (memory management issue, apparently specific to the MySQL 8.0 implementation; in MariaDB lock_grant_and_move_on_page() which corresponds to the MySQL 8.0 lock_grant_vats() does not allocate memory)
It looks like MySQL 8.0 employs a threshold based on lock_sys->n_waiting < LOCK_VATS_THRESHOLD (defined as 32), so it would be dynamically choosing between the algorithms FCFS and VATS (also called CATS by them) based on load. This could make it harder to repeat the MDEV-16664 failure scenario in MySQL 8.0.
For the record, MySQL 8.0 refers to the contribution as WL#10793 , MySQL Bug #84266 , pull request #115 , and mysql/mysql-server@fb056f4 .
Matthias Leich
added a comment - Similar fast replay with 10.1.35-MariaDB-debug (InnoDB: Percona XtraDB ( http://www.percona.com ) 5.6.39-83.1) and innodb_lock_schedule_algorithm=vats.
https://jira.mariadb.org/browse/MDEV-11039 affected 10.1.19, 10.2.3, 10.3.0.
In MariaDB Server 10.2.17 and 10.3.9, we will revert to the old algorithm (innodb_lock_schedule_algorithm=fcfs) until this problem has been isolated and fixed.
Marko Mäkelä
added a comment - In MariaDB Server 10.2.17 and 10.3.9, we will revert to the old algorithm ( innodb_lock_schedule_algorithm=fcfs ) until this problem has been isolated and fixed.
This looks very similar to MDEV-14552.
Also, sachin.setiya.007 says that he can reproduce this assertion failure by repeating a single test enough many times:
mysqltest: In included file "./include/sync_with_master_gtid.inc":
included from /home/sachin/10.2/mysql-test/suite/rpl/t/rpl_parallel_optimistic_nobinlog.test at line 59:
At line 43: query 'let $_result= `SELECT master_gtid_wait('$master_pos', $_slave_timeout)`' failed: 2013: Lost connection to MySQL server during query
Marko Mäkelä
added a comment - This looks very similar to MDEV-14552 .
Also, sachin.setiya.007 says that he can reproduce this assertion failure by repeating a single test enough many times:
rpl.rpl_parallel_optimistic_nobinlog 'innodb,stmt' w1 [ fail ]
Test ended at 2018-09-30 14:09:20
CURRENT_TEST: rpl.rpl_parallel_optimistic_nobinlog
mysqltest: In included file "./include/sync_with_master_gtid.inc":
included from /home/sachin/10.2/mysql-test/suite/rpl/t/rpl_parallel_optimistic_nobinlog.test at line 59:
At line 43: query 'let $_result= `SELECT master_gtid_wait('$master_pos', $_slave_timeout)`' failed: 2013: Lost connection to MySQL server during query
- the RQG grammar other_lock_assert.yy because that grammar is and will be not pushed into the official
Via running some RQG test campaign according to other_lock_assert.cc the following two bugs could be revealed on 10.3 2019-08
- the current bug https://jira.mariadb.org/browse/MDEV-16664
- https://jira.mariadb.org/browse/MDEV-15326
Both bugs show: InnoDB: Failing assertion: !other_lock || wsrep_thd_is_BF ...
Matthias Leich
added a comment -
The uploaded other_lock.tgz contains
- a rqg_batch.pl config file other_lock_assert.cc
- the RQG grammar other_lock_assert.yy because that grammar is and will be not pushed into the official
Via running some RQG test campaign according to other_lock_assert.cc the following two bugs could be revealed on 10.3 2019-08
- the current bug https://jira.mariadb.org/browse/MDEV-16664
- https://jira.mariadb.org/browse/MDEV-15326
Both bugs show: InnoDB: Failing assertion: !other_lock || wsrep_thd_is_BF ...
I am setting this to Blocker priority for the 10.6 release, with the intention that if this cannot be fixed before the 10.6.0 release, we will remove the feature.
If the bug can be fixed, the fix should be applied to 10.1 and later releases.
Marko Mäkelä
added a comment - I am setting this to Blocker priority for the 10.6 release, with the intention that if this cannot be fixed before the 10.6.0 release, we will remove the feature.
If the bug can be fixed, the fix should be applied to 10.1 and later releases.
In MySQL 8.0, there have been some fixes to CATS (as the feature is called there). The latest one as of this writing appears to be a complete rewrite of CATS in MySQL 8.0.20.
Marko Mäkelä
added a comment - In MySQL 8.0, there have been some fixes to CATS (as the feature is called there). The latest one as of this writing appears to be a complete rewrite of CATS in MySQL 8.0.20 .
lock_rec_add_to_queue : here we are adding a new record lock request to queue
lock_rec_queue_validate : here we validate single record lock requests queue
In both cases we have condition like
if (!(type_mode & (LOCK_WAIT | LOCK_GAP))) {
This means that this lock request does not have WAIT-flag or GAP-flag. For VATS and Galera WAIT-flag is interesting here. Traditionally, InnoDB used First-Come-First-Served scheduling. This means that if new lock requests does not have WAIT-flag there may not be any other conflicting lock requests for same record. If there would have been, it would not be granted this lock request and WAIT-flag would have been added and lock request would have been added to end of the queue.
However, Galera Brute-Force (BF) and VATS might have different scheduling.
BF: Brute force transactions should not wait any locks. There is two ways trying to avoid it.
If record that we want to lock is free we add lock request to front of possible queue even when there already is conflicting lock request arrived before this lock request. This, leads situation where this new lock request has not WAIT-flag and queue might have other conflicting lock waiting for its turn.
If conflicting lock for same record is already granted we will kill that transaction to let BF transaction to continue.
VATS: Here transaction record lock requests might be re-ordered based on how long transaction has waited locks. This means that whenever long running transaction makes a new lock request it might get turn to lock before other waiting conflicting lock requests. Thus, this lock request has not WAIT-flag even when queue might have other conflicting lock requests are waiting their turn.
After second analysis it seems that we might order new lock request before already granted lock request on a queue, this is naturally a bug.
Jan Lindström (Inactive)
added a comment - - edited Analysis: There is basically two cases
lock_rec_add_to_queue : here we are adding a new record lock request to queue
lock_rec_queue_validate : here we validate single record lock requests queue
In both cases we have condition like
if (!(type_mode & (LOCK_WAIT | LOCK_GAP))) {
This means that this lock request does not have WAIT-flag or GAP-flag. For VATS and Galera WAIT-flag is interesting here. Traditionally, InnoDB used First-Come-First-Served scheduling. This means that if new lock requests does not have WAIT-flag there may not be any other conflicting lock requests for same record. If there would have been, it would not be granted this lock request and WAIT-flag would have been added and lock request would have been added to end of the queue.
However, Galera Brute-Force (BF) and VATS might have different scheduling.
BF: Brute force transactions should not wait any locks. There is two ways trying to avoid it.
If record that we want to lock is free we add lock request to front of possible queue even when there already is conflicting lock request arrived before this lock request. This, leads situation where this new lock request has not WAIT-flag and queue might have other conflicting lock waiting for its turn.
If conflicting lock for same record is already granted we will kill that transaction to let BF transaction to continue.
VATS: Here transaction record lock requests might be re-ordered based on how long transaction has waited locks. This means that whenever long running transaction makes a new lock request it might get turn to lock before other waiting conflicting lock requests. Thus, this lock request has not WAIT-flag even when queue might have other conflicting lock requests are waiting their turn.
After second analysis it seems that we might order new lock request before already granted lock request on a queue, this is naturally a bug.
marko I leave it to you decide do we want to take whatever MySQL did for 8.0 or do we just fix the debug-assertion. In my opinion for GA-releases the new implementation looks quite a big change.
Jan Lindström (Inactive)
added a comment - marko I leave it to you decide do we want to take whatever MySQL did for 8.0 or do we just fix the debug-assertion. In my opinion for GA-releases the new implementation looks quite a big change.
Because MDEV-12837 already hard-wired Galera to use innodb_lock_schedule_algorithm=FCFS, the observations on Galera transaction certification process (which may abort other transactions) should not be very relevant here.
The assertion has been ‘watered down’ for Galera. The original assertion expression is simply !other_lock. This bug was repeated without enabling any Galera replication. What matters is that other_lock is set, that is, a conflicting lock exists when it is not expected to. This problem is specific to innodb_lock_schedule_algorithm=VATS only.
If there are no resources to fix the problematic VATS implementation, I think that the most meaningful course of action is to remove it from the MariaDB 10.6 release.
Note: MariaDB 10.2 also inherited a similar bug from MySQL 5.7, which we fixed in MDEV-15326. That was due to a race condition in transaction commit, which basically failed to release all implicit locks atomically when the transaction state changed to committed.
Marko Mäkelä
added a comment - Because MDEV-12837 already hard-wired Galera to use innodb_lock_schedule_algorithm=FCFS , the observations on Galera transaction certification process (which may abort other transactions) should not be very relevant here.
The assertion has been ‘watered down’ for Galera. The original assertion expression is simply !other_lock . This bug was repeated without enabling any Galera replication. What matters is that other_lock is set, that is, a conflicting lock exists when it is not expected to. This problem is specific to innodb_lock_schedule_algorithm=VATS only.
If there are no resources to fix the problematic VATS implementation, I think that the most meaningful course of action is to remove it from the MariaDB 10.6 release.
Note: MariaDB 10.2 also inherited a similar bug from MySQL 5.7, which we fixed in MDEV-15326 . That was due to a race condition in transaction commit, which basically failed to release all implicit locks atomically when the transaction state changed to committed.
I believe that the ALTER TABLE statement in the grammar that mleich posted could be replaced with COMMIT. I tried to create a MTR test case based on the RQG grammar, but my test failed to reproduce the assertion failure on a recent 10.5 branch.
Marko Mäkelä
added a comment - I believe that the ALTER TABLE statement in the grammar that mleich posted could be replaced with COMMIT . I tried to create a MTR test case based on the RQG grammar, but my test failed to reproduce the assertion failure on a recent 10.5 branch.
I analyzed a recent trace of this. We have two DELETE statements executing the locking read part, and both are being granted an exclusive lock on the same record (in a clustered index root page). The lock breakage appears to occur in the following section of the function lock_rec_insert_by_trx_age():
if (lock_get_wait(in_lock) && !lock_rec_has_to_wait_in_queue(in_lock)) {
lock_grant_have_trx_mutex(in_lock);
Right before the call, lock_rec_enqueue_waiting() removed the lock from the hash table:
In lock_rec_has_to_wait_in_queue(), we will find no conflicting lock, even though the lock that was acquired by the first thread still exists:
for (lock = lock_rec_get_first_on_page_addr(hash, space, page_no);
lock != wait_lock;
lock = lock_rec_get_next_on_page_const(lock)) {
In that loop, we fail to invoke lock_has_to_wait(wait_lock, lock) on the other_lock object will be flagged by the assertion failure in lock_rec_queue_validate(), invoked by lock_clust_rec_read_check_and_lock(), even though that other_lock remains granted in LOCK_X mode for the same record.
At the time the conflicting LOCK_Xother_lock was granted to the other DELETE, our DELETE was waiting for block->lock on the clustered index root page. You definitely should not need a big table to repeat this, because the clustered index consists of the root page only. The test case does not involve any secondary indexes. The PAGE_N_RECS is 6 at the time of the failure, but there are some PAGE_GARBAGE records in the page.
I suspect that the bug is at the end of lock_rec_insert_by_trx_age().
Marko Mäkelä
added a comment - I analyzed a recent trace of this. We have two DELETE statements executing the locking read part, and both are being granted an exclusive lock on the same record (in a clustered index root page). The lock breakage appears to occur in the following section of the function lock_rec_insert_by_trx_age() :
if (lock_get_wait(in_lock) && !lock_rec_has_to_wait_in_queue(in_lock)) {
lock_grant_have_trx_mutex(in_lock);
Right before the call, lock_rec_enqueue_waiting() removed the lock from the hash table:
if (innodb_lock_schedule_algorithm
== INNODB_LOCK_SCHEDULE_ALGORITHM_VATS
&& !prdt
&& !thd_is_replication_slave_thread(lock->trx->mysql_thd)) {
HASH_DELETE(lock_t, hash, &lock_sys.rec_hash,
lock_rec_lock_fold(lock), lock);
dberr_t res = lock_rec_insert_by_trx_age(lock);
In lock_rec_has_to_wait_in_queue() , we will find no conflicting lock, even though the lock that was acquired by the first thread still exists:
for (lock = lock_rec_get_first_on_page_addr(hash, space, page_no);
lock != wait_lock;
lock = lock_rec_get_next_on_page_const(lock)) {
In that loop, we fail to invoke lock_has_to_wait(wait_lock, lock) on the other_lock object will be flagged by the assertion failure in lock_rec_queue_validate() , invoked by lock_clust_rec_read_check_and_lock() , even though that other_lock remains granted in LOCK_X mode for the same record.
At the time the conflicting LOCK_X other_lock was granted to the other DELETE , our DELETE was waiting for block->lock on the clustered index root page. You definitely should not need a big table to repeat this, because the clustered index consists of the root page only. The test case does not involve any secondary indexes. The PAGE_N_RECS is 6 at the time of the failure, but there are some PAGE_GARBAGE records in the page.
I suspect that the bug is at the end of lock_rec_insert_by_trx_age() .
MariaDB Server 10.2, 10.3, 10.4, 10.5 will issue a warning message if the parameter is set to innodb_lock_schedule_algorithm=VATS, but the buggy behaviour will not be removed.
The parameter innodb_lock_schedule_algorithm will be removed in MariaDB Server 10.6.
Marko Mäkelä
added a comment - MariaDB Server 10.2, 10.3, 10.4, 10.5 will issue a warning message if the parameter is set to innodb_lock_schedule_algorithm=VATS , but the buggy behaviour will not be removed.
The parameter innodb_lock_schedule_algorithm will be removed in MariaDB Server 10.6.
Marko Mäkelä
added a comment - - edited Starting with MariaDB Server 10.2.35, 10.3.26, 10.4.16, 10.5.7, a deprecation and corruption warning will be issued if the server is being started up with innodb_lock_schedule_algorithm=VATS .
Starting with MariaDB Server 10.6, the parameter innodb_lock_schedule_algorithm will be removed .
jplindst, I think that it is confusing to reopen a bug that we already used for removing (10.6) and deprecating a feature that did not work. It might be better to file a separate ticket for fixing the crash.
Please rebase your work to a branch that includes a fix of MDEV-23897, and include a deterministic test case. Based on my analysis of the rr record trace, it should be relatively easy to create a DEBUG_SYNC test case that repeats the problem. We have a few DEBUG_SYNC points in the btr and row subdirectories that could be useful. Hopefully, at most one new sync point would have to be added.
Marko Mäkelä
added a comment - jplindst , I think that it is confusing to reopen a bug that we already used for removing (10.6) and deprecating a feature that did not work. It might be better to file a separate ticket for fixing the crash.
Please rebase your work to a branch that includes a fix of MDEV-23897 , and include a deterministic test case. Based on my analysis of the rr record trace, it should be relatively easy to create a DEBUG_SYNC test case that repeats the problem. We have a few DEBUG_SYNC points in the btr and row subdirectories that could be useful. Hopefully, at most one new sync point would have to be added.
MDEV-23877 was filed for documenting the deprecation of the parameter innodb_lock_schedule_algorithm.
This ticket will be used for tracking the originally reported corruption that is caused by the setting innodb_lock_schedule_algorithm=VATS. Once the feature is fixed to work correctly, and if it has been demonstrated to improve performance, it can be permanently enabled in a later version. The code does not exist in the current MariaDB 10.6 development branch.
Marko Mäkelä
added a comment - MDEV-23877 was filed for documenting the deprecation of the parameter innodb_lock_schedule_algorithm .
This ticket will be used for tracking the originally reported corruption that is caused by the setting innodb_lock_schedule_algorithm=VATS . Once the feature is fixed to work correctly, and if it has been demonstrated to improve performance, it can be permanently enabled in a later version. The code does not exist in the current MariaDB 10.6 development branch.
Test case attached produces lock wait queue exactly as in crash case but still actual crash does not reproduce so there is something more needed.
Jan Lindström (Inactive)
added a comment - Test case attached produces lock wait queue exactly as in crash case but still actual crash does not reproduce so there is something more needed.
Marko Mäkelä
added a comment - While working on MDEV-20612 , I noticed that possibly related to the MySQL 8.0 variant of this (called CATS), the deadlock detection was refactored and moved into a dedicated thread .
Unfortunately, I have not found a test case that would reproduce this and my guess fixes do not work. Therefore, I must say based on my current knowledge that I do not know how to fix this issue. If we want maybe we should use same as MySQL 8.0 (this change might be out of 10.6 scope).
Jan Lindström (Inactive)
added a comment - Unfortunately, I have not found a test case that would reproduce this and my guess fixes do not work. Therefore, I must say based on my current knowledge that I do not know how to fix this issue. If we want maybe we should use same as MySQL 8.0 (this change might be out of 10.6 scope).
Bug reports with similar content
MDEV-15326But slightly different backtrace.simplified RQG YY grammar:
query:
\# The ALTER TABLE t1 is required for replay, but what is that doing at all?
ALTER TABLE t1 |
DELETE FROM t1 WHERE col1 = 1 OR col1 IS NULL ; INSERT INTO t1 (col1) VALUES (1) ;
query_init:
start_delay ; CREATE TABLE IF NOT EXISTS t1 (col1 INT) ENGINE = InnoDB ; SET AUTOCOMMIT = 0;
start_delay:
{ sleep 5; return undef };
The bug seems to be a concurrency problem because attempts with on session only did not replay.
With 8 concurrent sessions I need usually one attempt lasting ~ 60s.