[MDEV-16936] Master and Galera Cluster as slave Created: 2018-08-10  Updated: 2023-06-06  Resolved: 2023-06-06

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication
Affects Version/s: 10.2.16
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Aurélien LEQUOY Assignee: Andrei Elkin
Resolution: Won't Fix Votes: 1
Labels: galera, innodb, replication
Environment:

Debian 9


Issue Links:
Relates
relates to MDEV-18641 "Resource deadlock avoided during COM... Closed
relates to MDEV-19572 Async slave on Galera node stops due ... Closed

 Description   

Master : MariaDB 5.5.28 (ROW) => GaleraCluster 6 Node on MariaDB 10.2.16 (2 segments of 3 nodes + Arbitrator)

This config was set for make a migration from MariaDB 5.5.28 to MariaDB 10.2.16

This replication broken in night but i saw this on morning :

show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 10.241.4.44
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000074
          Read_Master_Log_Pos: 391473581
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 557
        Relay_Master_Log_File: mariadb-bin.000074
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 391473581
              Relay_Log_Space: 867
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mariadb-bin.000074' at 391473581, the last event read from 'mariadb-bin.000074' at 391473581, the last byte read from 'mariadb-bin.000074' at 391473600.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 516410943
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
1 row in set (0.00 sec)

As you can see we have file, and position, so i searched in binarylog to see what happen, but it was impossible for me to find this position as you can see :

MariaDB [(none)]> SHOW BINLOG EVENTS IN 'mariadb-bin.000074' limit 4615070,20;
+--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
| Log_name           | Pos       | Event_type     | Server_id | End_log_pos | Info                                          |
+--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
| mariadb-bin.000074 | 391472813 | Table_map      |   3007010 |   391472872 | table_id: 2856 (mall.intuition_sites)         |
| mariadb-bin.000074 | 391472872 | Write_rows_v1  |   3007010 |   391472914 | table_id: 2856 flags: STMT_END_F              |
| mariadb-bin.000074 | 391472914 | Xid            |   3007010 |   391472945 | COMMIT /* xid=470986202 */                    |
| mariadb-bin.000074 | 391472945 | Gtid           |   3007010 |   391472987 | BEGIN GTID 0-3007010-1620164073               |
| mariadb-bin.000074 | 391472987 | Table_map      |   3007010 |   391473114 | table_id: 2713 (mall.offers_scores)           |
| mariadb-bin.000074 | 391473114 | Update_rows_v1 |   3007010 |   391473482 | table_id: 2713 flags: STMT_END_F              |
| mariadb-bin.000074 | 391473482 | Xid            |   3007010 |   391473513 | COMMIT /* xid=470986204 */                    |
| mariadb-bin.000074 | 391473513 | Gtid           |   3007010 |   391473555 | BEGIN GTID 0-3007010-1620164074               |
| mariadb-bin.000074 | 391473555 | Table_map      |   3007010 |   391473688 | table_id: 2857 (mall.intuition)               |
| mariadb-bin.000074 | 391473688 | Write_rows_v1  |   3007010 |   391474639 | table_id: 2857 flags: STMT_END_F              |
| mariadb-bin.000074 | 391474639 | Xid            |   3007010 |   391474670 | COMMIT /* xid=470986206 */                    |
| mariadb-bin.000074 | 391474670 | Gtid           |   3007010 |   391474712 | BEGIN GTID 0-3007010-1620164075               |
| mariadb-bin.000074 | 391474712 | Table_map      |   3007010 |   391474797 | table_id: 2680 (mall.offers_prices_finals_v2) |
| mariadb-bin.000074 | 391474797 | Update_rows_v1 |   3007010 |   391474913 | table_id: 2680 flags: STMT_END_F              |
| mariadb-bin.000074 | 391474913 | Xid            |   3007010 |   391474944 | COMMIT /* xid=470986208 */                    |
| mariadb-bin.000074 | 391474944 | Gtid           |   3007010 |   391474986 | BEGIN GTID 0-3007010-1620164076               |
| mariadb-bin.000074 | 391474986 | Table_map      |   3007010 |   391475063 | table_id: 2669 (mall.offer_flags)             |
| mariadb-bin.000074 | 391475063 | Update_rows_v1 |   3007010 |   391475169 | table_id: 2669 flags: STMT_END_F              |
| mariadb-bin.000074 | 391475169 | Xid            |   3007010 |   391475200 | COMMIT /* xid=470986210 */                    |
| mariadb-bin.000074 | 391475200 | Gtid           |   3007010 |   391475242 | BEGIN GTID 0-3007010-1620164077               |
+--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
20 rows in set (1.57 sec)

After reading my log on PmaControl I saw :

Date	exec_master_log_pos	last_io_errno	last_io_error	last_sql_errno	last_sql_error	master_log_file	wsrep_cluster_size	wsrep_cluster_status	wsrep_local_state_comment	wsrep_ready
2018-07-26 04:01:54	390831285	0		0		mariadb-bin.000071	6	Primary	Synced	ON
2018-07-26 04:02:06	390973194	0		0		mariadb-bin.000071	6	Primary	Synced	ON
2018-07-26 04:02:19	391438449	0		0		mariadb-bin.000071	6	Primary	Synced	ON
2018-07-26 04:02:31	391473581	0		1047	Node has dropped from cluster	mariadb-bin.000071	6	Primary	Synced	ON
2018-07-26 04:02:44	391473581	0		1047	Node has dropped from cluster	mariadb-bin.000071	6	Primary	Synced	ON

as you can see the problem on mariadb-bin.000071 at 2018-07-26 04:02:31, my question is why show salve status show us Master_Log_File: mariadb-bin.000074, it's should be stop on mariadb-bin.000071 no ?

that was my first question, second one when you restarted slave it's start from :

mariadb-bin.000074 : 391473581 (may i put it manually), ofc i got an error but this msg is kinda crazy, it's should say that position does not exist :

 
2018-07-26 13:21:29 140396387120896 [Note] Start binlog_dump to slave_server(1740027789), pos(mariadb-bin.000074, 391473581)
2018-07-26 14:50:31 140396388636416 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473581)
2018-07-26 19:09:43 140396394395392 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473581)
2018-07-26 19:21:36 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1769305198, event_type: 108
2018-07-26 19:22:35 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1863385196, event_type: 0
2018-07-26 19:22:56 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 16777217, event_type: 15
2018-07-26 20:09:28 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 145
2018-07-26 20:10:00 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 3299606784, event_type: 134
2018-07-26 20:21:04 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 19
2018-07-26 20:22:04 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 4244377359, event_type: 15
2018-07-26 20:22:26 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 8388608, event_type: 0
2018-07-26 20:22:29 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 32768, event_type: 0
2018-07-26 20:22:32 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 128, event_type: 128
2018-07-26 20:22:34 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 0
2018-07-26 20:22:37 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1526726656, event_type: 0
2018-07-26 20:22:39 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1499136000, event_type: 0
2018-07-26 20:22:41 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1465473792, event_type: 0
2018-07-26 20:22:43 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1683446107, event_type: 128
2018-07-26 20:22:46 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1617188697, event_type: 0
2018-07-26 20:22:54 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1617188697, event_type: 0
2018-07-26 20:23:16 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 291529815, event_type: 0
2018-07-26 20:23:54 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 2113929216, event_type: 34
2018-07-26 20:23:59 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 3758096384, event_type: 0
2018-07-26 21:43:46 140396391061248 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473482)

Ok now i know where to look :

MariaDB [(none)]> SHOW BINLOG EVENTS IN 'mariadb-bin.000071' from 391473581 limit 0,20;
+--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
| Log_name           | Pos       | Event_type     | Server_id | End_log_pos | Info                                                                             |
+--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
| mariadb-bin.000071 | 391473581 | Gtid           |   3007010 |   391473623 | BEGIN GTID 0-3007010-1611553616                                                  |
| mariadb-bin.000071 | 391473623 | Query          |   3007010 |   391473775 | DROP TEMPORARY TABLE IF EXISTS `mall`.`_replaced_unit` /* generated by server */ |
| mariadb-bin.000071 | 391473775 | Query          |   3007010 |   391473853 | COMMIT                                                                           |
| mariadb-bin.000071 | 391473853 | Gtid           |   3007010 |   391473895 | BEGIN GTID 0-3007010-1611553617                                                  |
| mariadb-bin.000071 | 391473895 | Table_map      |   3007010 |   391473964 | table_id: 2495 (mall.intuition_indexation)                                       |
| mariadb-bin.000071 | 391473964 | Update_rows_v1 |   3007010 |   391474022 | table_id: 2495 flags: STMT_END_F                                                 |
| mariadb-bin.000071 | 391474022 | Xid            |   3007010 |   391474053 | COMMIT /* xid=452403272 */                                                       |
| mariadb-bin.000071 | 391474053 | Gtid           |   3007010 |   391474095 | BEGIN GTID 0-3007010-1611553618                                                  |
| mariadb-bin.000071 | 391474095 | Table_map      |   3007010 |   391474162 | table_id: 2474 (mall.offers_monitor)                                             |
| mariadb-bin.000071 | 391474162 | Update_rows_v1 |   3007010 |   391474254 | table_id: 2474 flags: STMT_END_F                                                 |
| mariadb-bin.000071 | 391474254 | Xid            |   3007010 |   391474285 | COMMIT /* xid=452403274 */                                                       |
| mariadb-bin.000071 | 391474285 | Gtid           |   3007010 |   391474327 | BEGIN GTID 0-3007010-1611553619                                                  |
| mariadb-bin.000071 | 391474327 | Table_map      |   3007010 |   391474386 | table_id: 2479 (mall.intuition_sites)                                            |
| mariadb-bin.000071 | 391474386 | Write_rows_v1  |   3007010 |   391474428 | table_id: 2479 flags: STMT_END_F                                                 |
| mariadb-bin.000071 | 391474428 | Xid            |   3007010 |   391474459 | COMMIT /* xid=452403276 */                                                       |
| mariadb-bin.000071 | 391474459 | Gtid           |   3007010 |   391474501 | BEGIN GTID 0-3007010-1611553620                                                  |
| mariadb-bin.000071 | 391474501 | Table_map      |   3007010 |   391474628 | table_id: 2472 (mall.offers_scores)                                              |
| mariadb-bin.000071 | 391474628 | Update_rows_v1 |   3007010 |   391475004 | table_id: 2472 flags: STMT_END_F                                                 |
| mariadb-bin.000071 | 391475004 | Xid            |   3007010 |   391475035 | COMMIT /* xid=452403278 */                                                       |
| mariadb-bin.000071 | 391475035 | Gtid           |   3007010 |   391475077 | BEGIN GTID 0-3007010-1611553621                                                  |
+--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
20 rows in set (0.00 sec)

and in log :

2018-07-26  4:02:21 140018283419392 [Warning] WSREP: SQL statement was ineffective  thd: 1359262  buf: 190
schema: mall
QUERY: COMMIT
 => Skipping replication
2018-07-26  4:02:21 140018283419392 [Note] WSREP: cluster conflict due to certification failure for threads:
2018-07-26  4:02:21 140018283419392 [Note] WSREP: Victim thread:
   THD: 1359262, mode: local, state: executing, conflict: cert failure, seqno: -1
   SQL: COMMIT
2018-07-26  4:02:21 140018283419392 [ERROR] Slave SQL: Error 'Got error 35 "Resource deadlock avoided" during COMMIT' on query. Default database: 'mall'. Query: 'COMMIT', Gtid 0-3007010-1611553616, Internal MariaDB error code: 1180
2018-07-26  4:02:21 140018283419392 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 0-3007010-1611553616, Internal MariaDB error code: 1047
2018-07-26  4:02:21 140018283419392 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000071' at position 391473581

My question is how to make a deadlock from replication and galeracluster, ofc course it's should be only user RO on galera cluster but it's was not.

I was kinda surprise to see a deadlock between the thread of replication and the server, and more about droping a TEMPORARY table :

DROP TEMPORARY TABLE IF EXISTS `mall`.`_replaced_unit` /* generated by server */

It's linked with Galera cluster, because when it's was with server without Galera it's was working perfectly. (ok it's was not clean)



 Comments   
Comment by Jan Lindström [ 2023-06-06 ]

Both versions are EOL.

Generated at Thu Feb 08 08:32:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.