[MDEV-15252] DELETE with FKs crashes Galera nodes if wsrep_slave_threads>1 Created: 2018-02-08  Updated: 2020-08-25  Resolved: 2018-05-02

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.2.12, 10.2.13, 10.3.5
Fix Version/s: 10.2.15

Type: Bug Priority: Critical
Reporter: Claudio Nanni Assignee: Sachin Setiya (Inactive)
Resolution: Fixed Votes: 1
Labels: None

Attachments: File c17580.tar.gz     Text File error_log.txt     File server1.cnf     File server2.cnf    
Issue Links:
Duplicate
duplicates MDEV-15611 Due to the failure of foreign key det... Closed

 Description   

It just takes a two(with more is the same) nodes Galera cluster with wsrep_slave_threads>1 on the Slave node(s).
On the Master node import the sql files in sequence and you'll get the crash with error log full of messages of the type:

2018-02-08 11:41:04 140048415356672 [Note] InnoDB: SQL1: DELETE FROM `CA9176ed2945`/)|Z�'
2018-02-08 11:41:04 140048415356672 [Note] InnoDB: SQL2:
2018-02-08 11:41:04 140048415356672 [ERROR] mysqld: Can't find record in 'CA9176ed2945'
2018-02-08 11:41:04 140048415356672 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table c17580.CA9176ed2945; Can't find record in 'CA9176ed2945', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 799009, Internal MariaDB error code: 1032



 Comments   
Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-09 ]

need server.cnf configurations of the Nodes

Comment by Claudio Nanni [ 2018-02-09 ]

It's a brand new out-of-the-box 10.2.12, with default configuration(the one that comes with the package), just need to enable Galera and use wsrep_slave_threads=2 on one of the two nodes(the slave).
After that you import in sequence on the other node:
step_1_create.sql
step_2_import.sql
step_1_delete.sql

The slave will crash.

server2.cnf server1.cnf

Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-09 ]

claudio.nanni thank you for configurations ; it's not exactly the default configuration while the wsrep_sst_method is set to xtrabackup-v2 (default is rsync);
I tried to recreate with given configurations on both 10.3 and 10.1 but the deletion at the last step was performed w/o crash

MariaDB [foo]> select count(*) from CA9176ed2945  ;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    6
Current database: foo
 
+----------+
| count(*) |
+----------+
|   108505 |
+----------+
1 row in set (0.046 sec)
 
MariaDB [foo]> select count(*) from CA9176ed2945  ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.000 sec)
 

Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-12 ]

was not reproduced either on 10.2.12
note : with additional package percona-xtrabackup-24-2.4.9-1.el7.x86_64
xtrabackup --version
xtrabackup version 2.4.9 based on MySQL server 5.7.13 Linux (x86_64) (revision id: a467167cdd4)

+-------------------------+---------------------------------+
| Variable_name           | Value                           |
+-------------------------+---------------------------------+
| innodb_version          | 5.7.20                          |
| protocol_version        | 10                              |
| slave_type_conversions  |                                 |
| version                 | 10.2.12-MariaDB                 |
| version_comment         | MariaDB Server                  |
| version_compile_machine | x86_64                          |
| version_compile_os      | Linux                           |
| version_malloc_library  | system                          |
| version_ssl_library     | OpenSSL 1.0.1e-fips 11 Feb 2013 |
| wsrep_patch_version     | wsrep_25.21                     |
+-------------------------+---------------------------------+

after step2 :
table CA9176ed2945 was loaded with 108505 rows;
at step 3 deletion succeed

MariaDB [foo]> select count(*) from CA9176ed2945  ;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Comment by Claudio Nanni [ 2018-02-12 ]

Just tested on 10.2.9, it crashes there as well with thousands of:

2018-02-12 11:03:58 140509319476992 [Note] InnoDB:  SQL1: DELETE FROM  `CA9176ed2945`
2018-02-12 11:03:58 140509319476992 [Note] InnoDB:  SQL2: 
2018-02-12 11:03:58 140509319476992 [Note] InnoDB: BF-BF X lock conflict,mode: 1027 supremum: 0
2018-02-12 11:03:58 140509319476992 [Note] InnoDB: conflicts states: my 0 locked 0
RECORD LOCKS space id 155 page no 188 n bits 152 index PRIMARY of table `c1`.`CA9176ed2945` trx id 500768 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 19; compact format; info bits 32

10.1.31 does not crash (I use tarballs but that should not be relevant.)

winstone did you set wsrep_slave_threads>1 ?

Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-12 ]

yes , claudio.nanni, just as in the configurations you have supplied :wsrep_slave_thread=2
but I get the 10.2 from the repo, where the release where come with release id 10.2.12-MariaDB , not 10.2.9

Comment by Claudio Nanni [ 2018-02-12 ]

sst method is not important here, you can use any method.
I can reproduce with 10.2.9 and 10.2.12 using tarballs and rpm on RHEL7 64bit package.
10.2.12 also crashes on Ubuntu 16.04 LTS and CentOS via repository.

Interesting enough is that I tested on a RHEL7 VM and there I need to set wsrep_slave_threads>=3 to get the crash, so it could be also resource hence concurrency dependent, so if you can't reproduce it try to use higher values of wsrep_slave_threads.

Comment by Zdravelina Sokolovska (Inactive) [ 2018-02-14 ]

recreated on MariaDB 10.2.13 and xtrabackup version 2.4.9 with 2 Nodes and set wsrep_slave_threads=16
attached error log

Feb 14 19:12:30 localhost systemd: mariadb.service: main process exited, code=killed, status=6/ABRT
Feb 14 19:12:30 localhost systemd: Unit mariadb.service entered failed state.
Feb 14 19:12:30 localhost systemd: mariadb.service failed.
Feb 14 19:12:35 localhost systemd: mariadb.service holdoff time over, scheduling restart.
Feb 14 19:12:35 localhost systemd: Starting MariaDB 10.2.13 database server...
Feb 14 19:12:47 localhost sh: WSREP: Recovered position 68dc3337-1198-11e8-b694-afb950b338bc:108532
Feb 14 19:12:47 localhost mysqld: 2018-02-14 19:12:47 140260210030720 [Note] /usr/sbin/mysqld (mysqld 10.2.13-MariaDB) starting as process 7726 ...
Feb 14 19:14:17 localhost systemd: mariadb.service start operation timed out. Terminating.
Feb 14 19:14:30 localhost systemd: mariadb.service: main process exited, code=exited, status=1/FAILURE
Feb 14 19:14:30 localhost systemd: Failed to start MariaDB 10.2.13 database server.
Feb 14 19:14:30 localhost systemd: Unit mariadb.service entered failed state.
Feb 14 19:14:30 localhost systemd: mariadb.service failed.
Feb 14 19:20:01 localhost systemd: Started Session 199 of user root.
Feb 14 19:20:01 localhost systemd: Starting Session 199 of user root.

2018-02-14 19:12:21 140510129694464 [Note] InnoDB: BF-BF X lock conflict,mode: 1027 supremum: 0
2018-02-14 19:12:21 140510129694464 [Note] InnoDB: conflicts states: my 0 locked 0
RECORD LOCKS space id 5 page no 428 n bits 96 index PRIMARY of table `foo`.`CA9176ed2945` trx id 225747 lock_mode X locks rec but not gap
Record lock, heap no 24 PHYSICAL RECORD: n_fields 19; compact format; info bits 32
 0: len 4; hex 80000005; asc     ;;
 1: len 4; hex 80000010; asc     ;;
 2: len 16; hex 4d415563616333376637383137313232; asc MAUcac37f7817122;;
 3: len 18; hex 363939616664343833363731326335653737; asc 699afd4836712c5e77;;
 4: len 6; hex 0000000371d3; asc     q ;;
 5: len 7; hex 72000001c003f3; asc r      ;;
 6: len 4; hex 800007c4; asc     ;;
 7: len 10; hex 4d415563383838313732; asc MAUc888172;;
 8: len 12; hex 464754346638636534386466; asc FGT4f8ce48df;;
 9: len 24; hex 536368393137376330313761663466356563323062363833; asc Sch9177c017af4f5ec20b683;;
 10: len 4; hex 80000001; asc     ;;
 11: len 9; hex 800000000000024400; asc        D ;;
 12: len 9; hex 80000000000002b214; asc          ;;
 13: len 9; hex 80000000000002153c; asc         <;;
 14: len 9; hex 800000000000027a62; asc        zb;;
 15: len 9; hex 80000000000002153c; asc         <;;
 16: len 9; hex 800000000000027a62; asc        zb;;
 17: len 3; hex 800000; asc    ;;
 18: len 3; hex 800000; asc    ;;
 
Record lock, heap no 25 PHYSICAL RECORD: n_fields 19; compact format; info bits 32
 0: len 4; hex 80000005; asc     ;;
 1: len 4; hex 80000010; asc     ;;
 2: len 16; hex 4d415563616333376637383137313232; asc MAUcac37f7817122;;
 3: len 18; hex 373139326166653435363765316266363464; asc 7192afe4567e1bf64d;;
 4: len 6; hex 0000000371d3; asc     q ;;
 5: len 7; hex 72000001c00470; asc r     p;;
 6: len 4; hex 800007c4; asc     ;;
 7: len 10; hex 4d415563383838313732; asc MAUc888172;;
 8: len 12; hex 464754346638636534386466; asc FGT4f8ce48df;;
 9: len 24; hex 527562636436373334636164356262643866383261386330; asc Rubcd6734cad5bbd8f82a8c0;;
 10: len 4; hex 80000001; asc     ;;
 11: len 9; hex 800000000000000000; asc          ;;
 12: len 9; hex 800000000000000000; asc          ;;
 13: len 9; hex 800000000000000000; asc          ;;
 14: len 9; hex 800000000000000000; asc          ;;
 15: len 9; hex 800000000000000000; asc          ;;
 16: len 9; hex 800000000000000000; asc          ;;
 17: len 3; hex 800000; asc    ;;
 18: len 3; hex 800000; asc    ;;
 
2018-02-14 19:12:21 140510129694464 [Note] InnoDB:  SQL1: DELETE FROM  `CA9176ed2945`▒m▒Z
2018-02-14 19:12:21 140510129694464 [Note] InnoDB:  SQL2: DELETE FROM  `VA13db63946f`▒m▒Z
2018-02-14 19:12:21 140510129694464 [ERROR] mysqld: Can't find record in 'CA9176ed2945'
2018-02-14 19:12:21 140510129694464 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table foo.CA9176ed2945; Can't find record in 'CA9176ed2945', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1460595, Internal MariaDB error code: 1032
2018-02-14 19:12:21 140510129694464 [Warning] WSREP: RBR event 183 Delete_rows_v1 apply warning: 120, 108533
2018-02-14 19:12:21 140510129694464 [Warning] WSREP: Failed to apply app buffer: seqno: 108533, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
2018-02-14 19:12:21 140510129694464 [Note] InnoDB: BF-BF X lock conflict,mode: 1027 supremum: 0
2018-02-14 19:12:21 140510129694464 [Note] InnoDB: conflicts states: my 0 locked 0
RECORD LOCKS space id 5 page no 428 n bits 96 index PRIMARY of table `foo`.`CA9176ed2945` trx id 225747 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 19; compact format; info bits 32

Comment by Sachin Setiya (Inactive) [ 2018-04-02 ]

it is independent of sst_method and innodb_autoinc_lock_mode.

Comment by Sachin Setiya (Inactive) [ 2018-04-04 ]

Patch for mdev-15611 also solves this (test case is passing without failing http://lists.askmonty.org/pipermail/commits/2018-April/012274.html )

Comment by Zdravelina Sokolovska (Inactive) [ 2018-04-04 ]

appears also on 10.3.5

2018-04-04 17:30:33 2 [Note] InnoDB:  SQL1: DELETE FROM  `CA9176ed2945`{▒▒Z SQL2: DELETE FROM  `VA13db63946f`▒▒▒Z
2018-04-04 17:30:33 2 [ERROR] mysqld: Can't find record in 'CA9176ed2945'
2018-04-04 17:30:33 2 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table md_15252.CA9176ed2945; Can't find record in 'CA9176ed2945', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 2251485, Internal MariaDB error code: 1032
2018-04-04 17:30:34 2 [Warning] WSREP: RBR event 281 Delete_rows_v1 apply warning: 120, 257581
2018-04-04 17:30:34 2 [ERROR] WSREP: Failed to apply trx: source: 6557e5a0-3743-11e8-8941-fe255bc5c908 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 105705 trx_id: 697453 seqnos (l: 110076, g: 257581, s: 257580, d: 257580, ts: 89926014190718)
2018-04-04 17:30:34 2 [ERROR] WSREP: Failed to apply trx 257581 4 times
2018-04-04 17:30:34 2 [ERROR] WSREP: Node consistency compromised, aborting...
2018-04-04 17:30:34 2 [Note] WSREP: Closing send monitor...
2018-04-04 17:30:34 2 [Note] WSREP: Closed send monitor.
2018-04-04 17:30:34 2 [Note] WSREP: gcomm: terminating thread
2018-04-04 17:30:34 2 [Note] WSREP: gcomm: joining thread
2018-04-04 17:30:34 2 [Note] WSREP: gcomm: closing backend

Comment by Sachin Setiya (Inactive) [ 2018-05-02 ]

patch for mdev-15611 solves this

Comment by Sachin Setiya (Inactive) [ 2018-05-02 ]

Solved by mdev-15611

Generated at Thu Feb 08 08:19:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.