[MDEV-16242] MyRocks: parallel slave on a table without PK can stop with ER_KEY_NOT_FOUND Created: 2018-05-22  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Replication, Storage Engine - RocksDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-16428 Simple concurrent DML on RocksDB tabl... Confirmed

 Description   

Note: Run the test case with RocksDB and slave-parallel-threads=2 (or more). If it doesn't fail right away, also run with repeat=N.

# Run with --mysqld=--plugin-load-add=ha_rocksdb --mysqld=--slave-parallel-threads=2 --repeat=20
 
--source include/have_binlog_format_row.inc
--source include/master-slave.inc
 
CREATE TABLE t1 (a INT) ENGINE=RocksDB;
INSERT INTO t1 VALUES (1),(2),(3),(4);
 
CREATE TABLE t2 (a INT) ENGINE=RocksDB SELECT * FROM t1;
INSERT INTO t2 VALUES (5),(6),(7),(8);
DELETE FROM t2;
 
--sync_slave_with_master
 
# Cleanup
--connection master
DROP TABLE t1, t2;
--source include/rpl_end.inc

10.2 afe5a51c2df95

2018-05-22 14:56:42 139894937147136 [ERROR] mysqld: Can't find record in 't2'
2018-05-22 14:56:42 139894937147136 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table test.t2; Can't find record in 't2', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 1514, Gtid 0-1-5, Internal MariaDB error code: 1032
2018-05-22 14:56:42 139894937147136 [Warning] Slave: Can't find record in 't2' Error_code: 1032
2018-05-22 14:56:42 139894937147136 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 1317

Notes:

  • I don't see anything obvious in the binary log that could have caused it.
  • Not reproducible without parallel slave, or with parallel-slave-threads=1.
  • Not reproducible with InnoDB instead of RocksDB.


 Comments   
Comment by Sergei Petrunia [ 2018-05-24 ]

Added printouts and this is what I get from a failing run:

 140070159607616> DBImpl::WriteImpl wrote a batch with sequence=1
 140070159607616> DBImpl::WriteImpl wrote a batch with sequence=1
 
 thread1> ha_rockdb(0x7f641c014068, t1)::create
 thread1> DBImpl::WriteImpl wrote a batch with sequence=3
 thread1> DBImpl::WriteImpl wrote a batch with sequence=4

 thread2> Transaction got snapshot 5 
 thread2> Transaction got snapshot 5 
 thread2> ha_rockdb(0x7f6418014ee8, t1)::write_row()= 0
 thread2> ha_rockdb(0x7f6418014ee8, t1)::write_row()= 0
 thread2> ha_rockdb(0x7f6418014ee8, t1)::write_row()= 0
 thread2> ha_rockdb(0x7f6418014ee8, t1)::write_row()= 0
 thread2> DBImpl::WriteImpl wrote a batch with sequence=6

// CREATE TABLE t2 (a INT) ENGINE=RocksDB SELECT * FROM t1 starts:

 thread2> ha_rockdb(0x7f6418034468, t2)::create
 thread2> DBImpl::WriteImpl wrote a batch with sequence=10
 thread2> DBImpl::WriteImpl wrote a batch with sequence=11
 thread2> Transaction got snapshot 12 
 thread2> Transaction got snapshot 12 
 thread2> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0
 thread2> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0
 thread2> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0
 thread2> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0
 thread2> DBImpl::WriteImpl wrote a batch with sequence=13

// INSERT INTO t2 VALUES (5),(6),(7),(8) starts:

 thread1> Transaction got snapshot 16 
 thread1> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0
 thread1> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0
 thread1> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0
 thread1> ha_rockdb(0x7f64180373d8, t2)::write_row()= 0

// DELETE starts:

 thread2>    Rows_log_event::find_row(rgi=0x7f6424024060): scan
 thread2> Transaction got snapshot 16 
 thread2>      rnd_next()=0
 thread2>  Delete_rows_log_event::do_exec_row(rgi=0x7f6424024060): row found 
 thread2>  ha_rockdb(0x7f64180373d8, t2)::delete_row() = 0
 thread2>    Rows_log_event::find_row(rgi=0x7f6424024060): scan
 thread2>      rnd_next()=0
 thread2>  Delete_rows_log_event::do_exec_row(rgi=0x7f6424024060): row found 
 thread2>  ha_rockdb(0x7f64180373d8, t2)::delete_row() = 0
 thread2>    Rows_log_event::find_row(rgi=0x7f6424024060): scan
 thread2>      rnd_next()=0
 thread2>  Delete_rows_log_event::do_exec_row(rgi=0x7f6424024060): row found 
 thread2>  ha_rockdb(0x7f64180373d8, t2)::delete_row() = 0
 thread2>    Rows_log_event::find_row(rgi=0x7f6424024060): scan
 thread2>      rnd_next()=0
 thread2>  Delete_rows_log_event::do_exec_row(rgi=0x7f6424024060): row found 
 thread2>  ha_rockdb(0x7f64180373d8, t2)::delete_row() = 0
 thread2>    Rows_log_event::find_row(rgi=0x7f6424024060): scan
 thread2>      rnd_next()=HA_ERR_END_OF_FILE
 thread2>  Delete_rows_log_event::do_exec_row(rgi=0x7f6424024060): row NOT found

 thread1>  DBImpl::WriteImpl wrote a batch with sequence=17

Note that both INSERT ... and DELETE start with {{Transaction got snapshot 16 }}. This why DELETE is able to find first 4 rows, but fails to find the 5th - it simily doesn't see it.

And it doesn't see it because the other transaction has not committed yet. See the last line: DBImpl::WriteImpl wrote a batch with sequence=17 This happens after the DELETE has started and failed. I am not sure why.

Comment by Sergei Petrunia [ 2018-05-24 ]

Stopped the slave when it is running a DELETE:

  Thread 3 (Thread 0x7fffd84f8700 (LWP 21869)):
  #0  Rows_log_event::find_row (this=0x7fff7c01e6b0, rgi=0x7fff7c0221f0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:13074
  #1  0x0000555555eb54f3 in Delete_rows_log_event::do_exec_row (this=0x7fff7c01e6b0, rgi=0x7fff7c0221f0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:13462
  #2  0x0000555555eaf78c in Rows_log_event::do_apply_event (this=0x7fff7c01e6b0, rgi=0x7fff7c0221f0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:11022
  #3  0x0000555555a66857 in Log_event::apply_event (this=0x7fff7c01e6b0, rgi=0x7fff7c0221f0) at /home/psergey/dev-git/10.2-r12/sql/log_event.h:1452
  #4  0x0000555555a5b84f in apply_event_and_update_pos_apply (ev=0x7fff7c01e6b0, thd=0x7fff70000b00, rgi=0x7fff7c0221f0, reason=0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3570
  #5  0x0000555555a5be50 in apply_event_and_update_pos_for_parallel (ev=0x7fff7c01e6b0, thd=0x7fff70000b00, rgi=0x7fff7c0221f0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3745
  #6  0x0000555555cc1f3a in rpt_handle_event (qev=0x7fff7c0233d0, rpt=0x7fff7c00a710) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:50
  #7  0x0000555555cc4eeb in handle_rpl_parallel_thread (arg=0x7fff7c00a710) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:1300
  #8  0x00005555560e2922 in pfs_spawn_thread (arg=0x7fff7c00aba0) at /home/psergey/dev-git/10.2-r12/storage/perfschema/pfs.cc:1862
  #9  0x00007ffff64fa6aa in start_thread (arg=0x7fffd84f8700) at pthread_create.c:333
  #10 0x00007ffff5a02eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The other thread is here:

(gdb) wher
  #0  0x00007ffff59f998d in fdatasync () at ../sysdeps/unix/syscall-template.S:81
  #1  0x00007fffeb9c760f in rocksdb::PosixWritableFile::Sync (this=0x555557d19050) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/env/io_posix.cc:849
  #2  0x00007fffeb91c760 in rocksdb::WritableFileWriter::SyncInternal (this=0x555557d190d0, use_fsync=false) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/util/file_reader_writer.cc:350
  #3  0x00007fffeb91c563 in rocksdb::WritableFileWriter::SyncWithoutFlush (this=0x555557d190d0, use_fsync=false) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/util/file_reader_writer.cc:338
  #4  0x00007fffeb6c6151 in rocksdb::DBImpl::SyncWAL (this=0x555557d0c4c0) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/db/db_impl.cc:722
  #5  0x00007fffeb6c5c80 in rocksdb::DBImpl::FlushWAL (this=0x555557d0c4c0, sync=true) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/db/db_impl.cc:677
  #6  0x00007fffeb73a755 in rocksdb::DBImpl::WriteImplWALOnly (this=0x555557d0c4c0, write_options=..., my_batch=0x7fff74019c50, callback=0x0, log_used=0x7fff74013d48, log_ref=0, seq_used=0x0, pre_release_callback=0x0) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/db/db_impl_write.cc:605
  #7  0x00007fffeb737a51 in rocksdb::DBImpl::WriteImpl (this=0x555557d0c4c0, write_options=..., my_batch=0x7fff74019c50, callback=0x0, log_used=0x7fff74013d48, log_ref=0, disable_memtable=true, seq_used=0x0, pre_release_callback=0x0) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/db/db_impl_write.cc:96
  #8  0x00007fffeba7b174 in rocksdb::WriteCommittedTxn::PrepareInternal (this=0x7fff74013d40) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:225
  #9  0x00007fffeba7ad88 in rocksdb::PessimisticTransaction::Prepare (this=0x7fff74013d40) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/rocksdb/utilities/transactions/pessimistic_transaction.cc:197
  #10 0x00007fffeb60d6c6 in myrocks::Rdb_transaction_impl::prepare (this=0x7fff74013a30, name="\000\000\000\000\000\000\000\001\030\000MySQLXid\002\000\000\000\000\000\000\000\231\000\000\000\000\000\000") at /home/psergey/dev-git/10.2-r12/storage/rocksdb/ha_rocksdb.cc:2537
  #11 0x00007fffeb5db4f8 in myrocks::rocksdb_prepare (hton=0x555557a984f0, thd=0x7fff74000b00, prepare_tx=true) at /home/psergey/dev-git/10.2-r12/storage/rocksdb/ha_rocksdb.cc:3216
  #12 0x0000555555d7d846 in prepare_or_error (ht=0x555557a984f0, thd=0x7fff74000b00, all=true) at /home/psergey/dev-git/10.2-r12/sql/handler.cc:1160
  #13 0x0000555555d7e0fb in ha_commit_trans (thd=0x7fff74000b00, all=true) at /home/psergey/dev-git/10.2-r12/sql/handler.cc:1439
  #14 0x0000555555c681f6 in trans_commit (thd=0x7fff74000b00) at /home/psergey/dev-git/10.2-r12/sql/transaction.cc:307
  #15 0x0000555555ea8a3f in Xid_log_event::do_apply_event (this=0x7fff7c0217d0, rgi=0x7fff7c01b8a0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:8503
  #16 0x0000555555a66857 in Log_event::apply_event (this=0x7fff7c0217d0, rgi=0x7fff7c01b8a0) at /home/psergey/dev-git/10.2-r12/sql/log_event.h:1452
  #17 0x0000555555a5b84f in apply_event_and_update_pos_apply (ev=0x7fff7c0217d0, thd=0x7fff74000b00, rgi=0x7fff7c01b8a0, reason=0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3570
  #18 0x0000555555a5be50 in apply_event_and_update_pos_for_parallel (ev=0x7fff7c0217d0, thd=0x7fff74000b00, rgi=0x7fff7c01b8a0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3745
  #19 0x0000555555cc1f3a in rpt_handle_event (qev=0x7fff7c0218c0, rpt=0x7fff7c00a918) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:50
  #20 0x0000555555cc4eeb in handle_rpl_parallel_thread (arg=0x7fff7c00a918) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:1300
  #21 0x00005555560e2922 in pfs_spawn_thread (arg=0x7fff7c00b080) at /home/psergey/dev-git/10.2-r12/storage/perfschema/pfs.cc:1862
  #22 0x00007ffff64fa6aa in start_thread (arg=0x7fffd84ae700) at pthread_create.c:333
  #23 0x00007ffff5a02eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

(gdb) p this
  $85 = (Xid_log_event * const) 0x7fff7c0217d0
(gdb) p this->log_pos
  $86 = 1317
(gdb) p this->xid
  $88 = 119

Comment by Sergei Petrunia [ 2018-05-24 ]

Matching part of the binlog:

#180524 21:03:33 server id 1  end_log_pos 1128 CRC32 0xd0f7c8f7         GTID 0-1-4 trans
/*!100001 SET @@session.gtid_seq_no=4*//*!*/;
BEGIN
/*!*/;
# at 1128
# at 1188
#180524 21:03:33 server id 1  end_log_pos 1188 CRC32 0x2780c971         Annotate_rows:
#Q> INSERT INTO t2 VALUES (5),(6),(7),(8)
#180524 21:03:33 server id 1  end_log_pos 1233 CRC32 0x925d70a4         Table_map: `test`.`t2` mapped to number 19
# at 1233
#180524 21:03:33 server id 1  end_log_pos 1286 CRC32 0x1ff84ae4         Write_rows: table id 19 flags: STMT_END_F
 
BINLOG '
df4GWxMBAAAALQAAANEEAAAAABMAAAAAAAEABHRlc3QAAnQyAAEDAAGkcF2S
df4GWxcBAAAANQAAAAYFAAAAABMAAAAAAAEAAf/+BQAAAP4GAAAA/gcAAAD+CAAAAORK+B8=
'/*!*/;
# at 1286
#180524 21:03:33 server id 1  end_log_pos 1317 CRC32 0x08629bd6         Xid = 119
COMMIT/*!*/;
# at 1317
#180524 21:03:33 server id 1  end_log_pos 1359 CRC32 0xf84e9ffc         GTID 0-1-5 trans

The test would have failed:

                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows_v1 event on table test.t2; Can't find record in 't2', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 1514

                Parallel_Mode: conservative

Comment by Sergei Petrunia [ 2018-05-28 ]

Found this in sql_parallel.cc which may or may not be a problem:

      Event groups that group-committed together on the master can be run
      in parallel with each other without restrictions. But one batch of
      group-commits may not start before all groups in the previous batch
      have initiated their commit phase; we set up rgi->gco to ensure that.

It says initiated, I'm wondering how it's possible to tell that a commit "has initiated" (but not finished?)

Comment by Sergei Petrunia [ 2018-05-28 ]

INSERTer thread:

AAA 140713803347712 Transaction got snapshot 16 
AAA 140713803347712: ha_rockdb(0x7ffa1c0373d8, t2)::write_row()= 0
AAA 140713803347712: ha_rockdb(0x7ffa1c0373d8, t2)::write_row()= 0
AAA 140713803347712: ha_rockdb(0x7ffa1c0373d8, t2)::write_row()= 0
AAA 140713803347712: ha_rockdb(0x7ffa1c0373d8, t2)::write_row()= 0
AAA 140713803347712: mark_start_commit_inner
AAA 140713803347712: mark_start_commit_inner: calling mysql_cond_broadcast
AAA 140713803347712: rocksdb_prepare

(let's call the above Step1)

Replication code calls mark_start_commit_inner() to wake up the thread that is
executing the subsequent event group. Note that the effects of the INSERT are
not yet made globally visible in MyRocks (even rocksdb_prepare() is called
after mark_start_commit_inner() has returned).

The DELETEer thread starts to execute its event:
(let's call the below Step2)

AAA 140713803650816:   Rows_log_event::find_row(rgi=0x7ffa3001ee80): scan
AAA 140713803650816 Transaction got snapshot 16 
AAA 140713803650816:     rnd_next()=0
AAA 140713803650816: Delete_rows_log_event::do_exec_row(rgi=0x7ffa3001ee80): row found 
AAA 140713803650816: ha_rockdb(0x7ffa1c0373d8, t2)::delete_row() = 0
AAA 140713803650816:   Rows_log_event::find_row(rgi=0x7ffa3001ee80): scan
AAA 140713803650816:     rnd_next()=0
AAA 140713803650816: Delete_rows_log_event::do_exec_row(rgi=0x7ffa3001ee80): row found 
AAA 140713803650816: ha_rockdb(0x7ffa1c0373d8, t2)::delete_row() = 0
AAA 140713803650816:   Rows_log_event::find_row(rgi=0x7ffa3001ee80): scan
AAA 140713803650816:     rnd_next()=0
AAA 140713803650816: Delete_rows_log_event::do_exec_row(rgi=0x7ffa3001ee80): row found 
AAA 140713803650816: ha_rockdb(0x7ffa1c0373d8, t2)::delete_row() = 0
AAA 140713803650816:   Rows_log_event::find_row(rgi=0x7ffa3001ee80): scan
AAA 140713803650816:     rnd_next()=0
AAA 140713803650816: Delete_rows_log_event::do_exec_row(rgi=0x7ffa3001ee80): row found 
AAA 140713803650816: ha_rockdb(0x7ffa1c0373d8, t2)::delete_row() = 0
AAA 140713803650816:   Rows_log_event::find_row(rgi=0x7ffa3001ee80): scan
AAA 140713803650816:     rnd_next()=HA_ERR_END_OF_FILE
AAA 140713803650816: Delete_rows_log_event::do_exec_row(rgi=0x7ffa3001ee80): row NOT found

The DELETEr thread predictably fails to find the row:

2018-05-28 17:03:39 140713803650816 [ERROR] mysqld: Can't find record in 't2'
2018-05-28 17:03:39 140713803650816 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table test.t2; Can't find record in 't2', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 1514, Gtid 0-1-5, Internal MariaDB error code: 1032
2018-05-28 17:03:39 140713803650816 [Warning] Slave: Can't find record in 't2' Error_code: 1032
2018-05-28 17:03:39 140713803650816 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 1086
2018-05-28 17:03:39 140713803953920 [Note] Error reading relay log event: slave SQL thread was killed

and only now the INSERTer commits (and so the rows that we failed to delete are
made visible):

(let's call the below Step1A)

AAA 140713803347712: rocksdb_commit_ordered doing commit
AAA 140713803347712 DBImpl::WriteImpl wrote a batch with sequence=17
AAA 140713803347712: rocksdb_commit: _ordered has committed already

Comment by Sergei Petrunia [ 2018-05-28 ]

Why does Step2 start before Step1A?

This is Ok in regular circumstances as the transaction that was running in Step1 (let's call it TRX-1) has already obtained locks on all rows that it is going to modify in Step1. If the transaction in Step2 needs to modify the same rows, it will block until the lock is released (in Step1A).

This works when the table has a unique index and the code that applies RBR event on the slave does a locking index lookup. The lookup fails with
HA_ERR_LOCK_DEADLOCK error. Parallel slave re-tries the transaction which gets a new snapshot (a stack trace of this happening: https://gist.github.com/spetrunia/9caccf07905a3cf6434be87d5bfb6e23 ) and replication proceeds.

Comment by Sergei Petrunia [ 2018-05-29 ]

Used debugger and injected my_sleep() calls to force the same to happen for InnoDB (for the example with the unique index).

InnoDB doesn't return an error, instead, it waits for the row lock and sleeps:
log: https://gist.github.com/spetrunia/7c2227e7a9a23e9cb746c8182b7483b0

DELETEr thread sleeping:

  Thread 3 (Thread 0x7fffd04f8700 (LWP 24688)):
  #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  #1  0x00005555561bbebc in os_event::wait (this=0x7fff700123f0) at /home/psergey/dev-git/10.2-r12/storage/innobase/os/os0event.cc:166
  #2  0x00005555561bb8cb in os_event::wait_low (this=0x7fff700123f0, reset_sig_count=1) at /home/psergey/dev-git/10.2-r12/storage/innobase/os/os0event.cc:336
  #3  0x00005555561bbc71 in os_event_wait_low (event=0x7fff700123f0, reset_sig_count=0) at /home/psergey/dev-git/10.2-r12/storage/innobase/os/os0event.cc:535
  #4  0x000055555618a44b in lock_wait_suspend_thread (thr=0x7fff70034c38) at /home/psergey/dev-git/10.2-r12/storage/innobase/lock/lock0wait.cc:355
  #5  0x0000555556239474 in row_mysql_handle_errors (new_err=0x7fffd04f613c, trx=0x7fffe80ba7e8, thr=0x7fff70034c38, savept=0x0) at /home/psergey/dev-git/10.2-r12/storage/innobase/row/row0mysql.cc:732
  #6  0x0000555556272f2c in row_search_mvcc (buf=0x7fff70031198 "\377\005", mode=PAGE_CUR_GE, prebuilt=0x7fff70034238, match_mode=1, direction=0) at /home/psergey/dev-git/10.2-r12/storage/innobase/row/row0sel.cc:5673
  #7  0x0000555556104db1 in ha_innobase::index_read (this=0x7fff70033a48, buf=0x7fff70031198 "\377\005", key_ptr=0x7fff70031370 "\005", key_len=4, find_flag=HA_READ_KEY_EXACT) at /home/psergey/dev-git/10.2-r12/storage/innobase/handler/ha_innodb.cc:9501
  #8  0x0000555556106383 in ha_innobase::rnd_pos (this=0x7fff70033a48, buf=0x7fff70031198 "\377\005", pos=0x7fff70031370 "\005") at /home/psergey/dev-git/10.2-r12/storage/innobase/handler/ha_innodb.cc:10049
  #9  0x0000555555d8bb7b in handler::rnd_pos_by_record (this=0x7fff70033a48, record=0x7fff70031198 "\377\005") at /home/psergey/dev-git/10.2-r12/sql/handler.h:3235
  #10 0x0000555555eb71d4 in handler::ha_rnd_pos_by_record (this=0x7fff70033a48, buf=0x7fff70031198 "\377\005") at /home/psergey/dev-git/10.2-r12/sql/sql_class.h:5786
  #11 0x0000555555eb4885 in Rows_log_event::find_row (this=0x7fff7c023ba0, rgi=0x7fff7c021f50) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:13124
  #12 0x0000555555eb563f in Delete_rows_log_event::do_exec_row (this=0x7fff7c023ba0, rgi=0x7fff7c021f50) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:13462
  #13 0x0000555555eaf8d8 in Rows_log_event::do_apply_event (this=0x7fff7c023ba0, rgi=0x7fff7c021f50) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:11022
  #14 0x0000555555a66857 in Log_event::apply_event (this=0x7fff7c023ba0, rgi=0x7fff7c021f50) at /home/psergey/dev-git/10.2-r12/sql/log_event.h:1452
  #15 0x0000555555a5b84f in apply_event_and_update_pos_apply (ev=0x7fff7c023ba0, thd=0x7fff70000b00, rgi=0x7fff7c021f50, reason=0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3570
  #16 0x0000555555a5be50 in apply_event_and_update_pos_for_parallel (ev=0x7fff7c023ba0, thd=0x7fff70000b00, rgi=0x7fff7c021f50) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3745
  #17 0x0000555555cc2006 in rpt_handle_event (qev=0x7fff7c023e20, rpt=0x7fff7c00a710) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:50
  #18 0x0000555555cc5003 in handle_rpl_parallel_thread (arg=0x7fff7c00a710) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:1305
  #19 0x00005555560e2a6e in pfs_spawn_thread (arg=0x7fff7c00aba0) at /home/psergey/dev-git/10.2-r12/storage/perfschema/pfs.cc:1862
  #20 0x00007ffff64fa6aa in start_thread (arg=0x7fffd04f8700) at pthread_create.c:333

INSERTer thread (the my_sleep call is my debugging code that forces it to stop there):

  Thread 6 (Thread 0x7fffd04ae700 (LWP 24689)):
  #0  0x00007ffff59f9743 in select () at ../sysdeps/unix/syscall-template.S:81
  #1  0x00005555564d4e65 in my_sleep (m_seconds=500000) at /home/psergey/dev-git/10.2-r12/mysys/my_sleep.c:29
  #2  0x000055555611499a in innobase_xa_prepare (hton=0x555557940570, thd=0x7fff74000b00, prepare_trx=true) at /home/psergey/dev-git/10.2-r12/storage/innobase/handler/ha_innodb.cc:17388
  #3  0x0000555555d7d992 in prepare_or_error (ht=0x555557940570, thd=0x7fff74000b00, all=true) at /home/psergey/dev-git/10.2-r12/sql/handler.cc:1160
  #4  0x0000555555d7e247 in ha_commit_trans (thd=0x7fff74000b00, all=true) at /home/psergey/dev-git/10.2-r12/sql/handler.cc:1439
  #5  0x0000555555c682c2 in trans_commit (thd=0x7fff74000b00) at /home/psergey/dev-git/10.2-r12/sql/transaction.cc:307
  #6  0x0000555555ea8b8b in Xid_log_event::do_apply_event (this=0x7fff7c021310, rgi=0x7fff7c01b8a0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:8503
  #7  0x0000555555a66857 in Log_event::apply_event (this=0x7fff7c021310, rgi=0x7fff7c01b8a0) at /home/psergey/dev-git/10.2-r12/sql/log_event.h:1452
  #8  0x0000555555a5b84f in apply_event_and_update_pos_apply (ev=0x7fff7c021310, thd=0x7fff74000b00, rgi=0x7fff7c01b8a0, reason=0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3570
  #9  0x0000555555a5be50 in apply_event_and_update_pos_for_parallel (ev=0x7fff7c021310, thd=0x7fff74000b00, rgi=0x7fff7c01b8a0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3745
  #10 0x0000555555cc2006 in rpt_handle_event (qev=0x7fff7c021400, rpt=0x7fff7c00a918) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:50
  #11 0x0000555555cc5003 in handle_rpl_parallel_thread (arg=0x7fff7c00a918) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:1305
  #12 0x00005555560e2a6e in pfs_spawn_thread (arg=0x7fff7c00b080) at /home/psergey/dev-git/10.2-r12/storage/perfschema/pfs.cc:1862
  #13 0x00007ffff64fa6aa in start_thread (arg=0x7fffd04ae700) at pthread_create.c:333

Testcase: https://gist.github.com/spetrunia/ce9bd74571fb252427364e3c57aebfdc
Diffs: https://gist.github.com/spetrunia/bd91e4e763203a03f62ca6f8bf3b11a2
https://gist.github.com/spetrunia/840739e23eed3cb42ef28da458afccf1

Comment by Sergei Petrunia [ 2018-05-29 ]

Running InnoDB testcase without the primary key:

--source include/have_innodb.inc
--source include/have_binlog_format_row.inc
--source include/master-slave.inc
 
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1),(2),(3),(4);
 
CREATE TABLE t2 (a INT) ENGINE=InnoDB SELECT * FROM t1;
INSERT INTO t2 VALUES (5),(6),(7),(8);
DELETE FROM t2;
 
--sync_slave_with_master
 
# Cleanup
--connection master
DROP TABLE t1, t2;
--source include/rpl_end.inc

AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736688260864: mark_start_commit_inner
AAA 140736688260864: mark_start_commit_inner: calling mysql_cond_broadcast
AAA 140736688260864: innobase_xa_prepare() called trx_prepare_for_mysql
AAA 140736688260864: innobase_commit_ordered()
AAA 140736688260864: innobase_commit() committing on its own
AAA 140736687957760: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736687957760: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736687957760: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736687957760: ha_innobase(0x7fff700335f8, t2)::write_row()=0
AAA 140736687957760: mark_start_commit_inner
AAA 140736687957760: mark_start_commit_inner: calling mysql_cond_broadcast
AAA 140736687957760: innobase_xa_prepare() called trx_prepare_for_mysql
AAA 140736688260864:   Rows_log_event::find_row(rgi=0x7fff7c023b40): scan
AAA 140736688260864:     rnd_next()=0
2018-05-29 13:59:05 140736688260864 [Note] The slave is applying a ROW event on behalf of a DELETE statement on table t2 and is currently taking a considerable amount of time (300 seconds). This is due to the fact that it is scanning the table while looking up records to be processed. Consider adding a primary key (or unique key) to the table to improve performance.
AAA 140736688260864: Delete_rows_log_event::do_exec_row(rgi=0x7fff7c023b40): row found
AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::delete_row()=0
AAA 140736688260864:   Rows_log_event::find_row(rgi=0x7fff7c023b40): scan
AAA 140736688260864:     rnd_next()=0
AAA 140736688260864: Delete_rows_log_event::do_exec_row(rgi=0x7fff7c023b40): row found
AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::delete_row()=0
AAA 140736688260864:   Rows_log_event::find_row(rgi=0x7fff7c023b40): scan
AAA 140736688260864:     rnd_next()=0
AAA 140736688260864: Delete_rows_log_event::do_exec_row(rgi=0x7fff7c023b40): row found
AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::delete_row()=0
AAA 140736688260864:   Rows_log_event::find_row(rgi=0x7fff7c023b40): scan
AAA 140736688260864:     rnd_next()=0
AAA 140736688260864: Delete_rows_log_event::do_exec_row(rgi=0x7fff7c023b40): row found
AAA 140736688260864: ha_innobase(0x7fff700335f8, t2)::delete_row()=0
AAA 140736688260864:   Rows_log_event::find_row(rgi=0x7fff7c023b40): scan

The INSERTer is artificially held here in the return from innobase_xa_prepare :

  Thread 6 (Thread 0x7fffd04ae700 (LWP 31375)):
  #0  0x00007ffff59f9743 in select () at ../sysdeps/unix/syscall-template.S:81
  #1  0x00005555564d4e65 in my_sleep (m_seconds=500000) at /home/psergey/dev-git/10.2-r12/mysys/my_sleep.c:29
  #2  0x000055555611499a in innobase_xa_prepare (hton=0x555557940570, thd=0x7fff74000b00, prepare_trx=true) at /home/psergey/dev-git/10.2-r12/storage/innobase/handler/ha_innodb.cc:17388
  #3  0x0000555555d7d992 in prepare_or_error (ht=0x555557940570, thd=0x7fff74000b00, all=true) at /home/psergey/dev-git/10.2-r12/sql/handler.cc:1160
  #4  0x0000555555d7e247 in ha_commit_trans (thd=0x7fff74000b00, all=true) at /home/psergey/dev-git/10.2-r12/sql/handler.cc:1439
  #5  0x0000555555c682c2 in trans_commit (thd=0x7fff74000b00) at /home/psergey/dev-git/10.2-r12/sql/transaction.cc:307
  #6  0x0000555555ea8b8b in Xid_log_event::do_apply_event (this=0x7fff7c0212b0, rgi=0x7fff7c01b840) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:8503
  #7  0x0000555555a66857 in Log_event::apply_event (this=0x7fff7c0212b0, rgi=0x7fff7c01b840) at /home/psergey/dev-git/10.2-r12/sql/log_event.h:1452
  #8  0x0000555555a5b84f in apply_event_and_update_pos_apply (ev=0x7fff7c0212b0, thd=0x7fff74000b00, rgi=0x7fff7c01b840, reason=0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3570
  #9  0x0000555555a5be50 in apply_event_and_update_pos_for_parallel (ev=0x7fff7c0212b0, thd=0x7fff74000b00, rgi=0x7fff7c01b840) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3745
  #10 0x0000555555cc2006 in rpt_handle_event (qev=0x7fff7c0213a0, rpt=0x7fff7c00a918) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:50
  #11 0x0000555555cc5003 in handle_rpl_parallel_thread (arg=0x7fff7c00a918) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:1305
  #12 0x00005555560e2a6e in pfs_spawn_thread (arg=0x7fff7c00aba0) at /home/psergey/dev-git/10.2-r12/storage/perfschema/pfs.cc:1862
  #13 0x00007ffff64fa6aa in start_thread (arg=0x7fffd04ae700) at pthread_create.c:333
  #14 0x00007ffff5a02eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

The DELETEr has deleted first four rows (which it could see) and then started another full table scan to read the 5th row (which is not yet committed) and is waiting on a lock here:

  Thread 3 (Thread 0x7fffd04f8700 (LWP 31374)):
  #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
  #1  0x00005555561bbebc in os_event::wait (this=0x7fff7002e500) at /home/psergey/dev-git/10.2-r12/storage/innobase/os/os0event.cc:166
  #2  0x00005555561bb8cb in os_event::wait_low (this=0x7fff7002e500, reset_sig_count=1) at /home/psergey/dev-git/10.2-r12/storage/innobase/os/os0event.cc:336
  #3  0x00005555561bbc71 in os_event_wait_low (event=0x7fff7002e500, reset_sig_count=0) at /home/psergey/dev-git/10.2-r12/storage/innobase/os/os0event.cc:535
  #4  0x000055555618a44b in lock_wait_suspend_thread (thr=0x7fff700347e0) at /home/psergey/dev-git/10.2-r12/storage/innobase/lock/lock0wait.cc:355
  #5  0x0000555556239474 in row_mysql_handle_errors (new_err=0x7fffd04f608c, trx=0x7fffe80ba7e8, thr=0x7fff700347e0, savept=0x0) at /home/psergey/dev-git/10.2-r12/storage/innobase/row/row0mysql.cc:732
  #6  0x0000555556272f2c in row_search_mvcc (buf=0x7fff7002e018 "\375\005", mode=PAGE_CUR_G, prebuilt=0x7fff70033de8, match_mode=0, direction=0) at /home/psergey/dev-git/10.2-r12/storage/innobase/row/row0sel.cc:5673
  #7  0x0000555556104db1 in ha_innobase::index_read (this=0x7fff700335f8, buf=0x7fff7002e018 "\375\005", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /home/psergey/dev-git/10.2-r12/storage/innobase/handler/ha_innodb.cc:9501
  #8  0x000055555610602c in ha_innobase::index_first (this=0x7fff700335f8, buf=0x7fff7002e018 "\375\005") at /home/psergey/dev-git/10.2-r12/storage/innobase/handler/ha_innodb.cc:9921
  #9  0x0000555556106237 in ha_innobase::rnd_next (this=0x7fff700335f8, buf=0x7fff7002e018 "\375\005") at /home/psergey/dev-git/10.2-r12/storage/innobase/handler/ha_innodb.cc:10014
  #10 0x0000555555d818ab in handler::ha_rnd_next (this=0x7fff700335f8, buf=0x7fff7002e018 "\375\005") at /home/psergey/dev-git/10.2-r12/sql/handler.cc:2578
  #11 0x0000555555eb4f9e in Rows_log_event::find_row (this=0x7fff7c023b40, rgi=0x7fff7c021ef0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:13305
  #12 0x0000555555eb563f in Delete_rows_log_event::do_exec_row (this=0x7fff7c023b40, rgi=0x7fff7c021ef0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:13462
  #13 0x0000555555eaf8d8 in Rows_log_event::do_apply_event (this=0x7fff7c023b40, rgi=0x7fff7c021ef0) at /home/psergey/dev-git/10.2-r12/sql/log_event.cc:11022
  #14 0x0000555555a66857 in Log_event::apply_event (this=0x7fff7c023b40, rgi=0x7fff7c021ef0) at /home/psergey/dev-git/10.2-r12/sql/log_event.h:1452
  #15 0x0000555555a5b84f in apply_event_and_update_pos_apply (ev=0x7fff7c023b40, thd=0x7fff70000b00, rgi=0x7fff7c021ef0, reason=0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3570
  #16 0x0000555555a5be50 in apply_event_and_update_pos_for_parallel (ev=0x7fff7c023b40, thd=0x7fff70000b00, rgi=0x7fff7c021ef0) at /home/psergey/dev-git/10.2-r12/sql/slave.cc:3745
  #17 0x0000555555cc2006 in rpt_handle_event (qev=0x7fff7c023dc0, rpt=0x7fff7c00a710) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:50
  #18 0x0000555555cc5003 in handle_rpl_parallel_thread (arg=0x7fff7c00a710) at /home/psergey/dev-git/10.2-r12/sql/rpl_parallel.cc:1305
  #19 0x00005555560e2a6e in pfs_spawn_thread (arg=0x7fff7c00aba0) at /home/psergey/dev-git/10.2-r12/storage/perfschema/pfs.cc:1862
  #20 0x00007ffff64fa6aa in start_thread (arg=0x7fffd04f8700) at pthread_create.c:333

Comment by Sergei Petrunia [ 2018-05-31 ]

https://lists.launchpad.net/maria-developers/msg11306.html

Comment by Elena Stepanova [ 2018-06-03 ]

The test case below fails both with conservative and optimistic parallel replication. Note command-line options – higher number of parallel threads seems to make a difference.

# Run with --mem --mysqld=--plugin-load-add=ha_rocksdb --mysqld=--slave-parallel-threads=8 --repeat=20 --mysqld=--slave_parallel_mode=optimistic
 
--source include/have_binlog_format_row.inc
--source include/master-slave.inc
 
call mtr.add_suppression("Can't find record in");
 
CREATE TABLE t1 ENGINE=RocksDB SELECT 1;
DELETE FROM t1;
INSERT INTO t1  SELECT 1;
DELETE FROM t1;
--sync_slave_with_master
 
# Cleanup
--connection master
DROP TABLE t1;
--source include/rpl_end.inc

2018-06-04  2:29:42 20 [ERROR] Slave worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
2018-06-04  2:29:42 20 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 0-1-3, Internal MariaDB error code: 1205
2018-06-04  2:29:42 20 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
2018-06-04  2:29:42 20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 1055
2018-06-04  2:29:42 13 [Note] Error reading relay log event: slave SQL thread was killed
2018-06-04  2:29:42 19 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-06-04  2:29:42 19 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-06-04  2:29:42 19 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 1055
2018-06-04  2:29:42 18 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-06-04  2:29:42 18 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2018-06-04  2:29:42 18 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 1055
2018-06-04  2:29:42 13 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 1055

Increasing slave_transaction_retries doesn't help, it just makes the slave struggle longer. Maybe some changes in lock wait timeouts would help, but I don't know what they should be.

Generated at Thu Feb 08 08:27:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.