Details

      Description

      Master : MariaDB 5.5.28 (ROW) => GaleraCluster 6 Node on MariaDB 10.2.16 (2 segments of 3 nodes + Arbitrator)

      This config was set for make a migration from MariaDB 5.5.28 to MariaDB 10.2.16

      This replication broken in night but i saw this on morning :

      show slave status\G
      *************************** 1. row ***************************
                     Slave_IO_State: 
                        Master_Host: 10.241.4.44
                        Master_User: replication
                        Master_Port: 3306
                      Connect_Retry: 60
                    Master_Log_File: mariadb-bin.000074
                Read_Master_Log_Pos: 391473581
                     Relay_Log_File: mysqld-relay-bin.000002
                      Relay_Log_Pos: 557
              Relay_Master_Log_File: mariadb-bin.000074
                   Slave_IO_Running: No
                  Slave_SQL_Running: Yes
                    Replicate_Do_DB: 
                Replicate_Ignore_DB: 
                 Replicate_Do_Table: 
             Replicate_Ignore_Table: 
            Replicate_Wild_Do_Table: 
        Replicate_Wild_Ignore_Table: 
                         Last_Errno: 0
                         Last_Error: 
                       Skip_Counter: 0
                Exec_Master_Log_Pos: 391473581
                    Relay_Log_Space: 867
                    Until_Condition: None
                     Until_Log_File: 
                      Until_Log_Pos: 0
                 Master_SSL_Allowed: No
                 Master_SSL_CA_File: 
                 Master_SSL_CA_Path: 
                    Master_SSL_Cert: 
                  Master_SSL_Cipher: 
                     Master_SSL_Key: 
              Seconds_Behind_Master: NULL
      Master_SSL_Verify_Server_Cert: No
                      Last_IO_Errno: 1236
                      Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mariadb-bin.000074' at 391473581, the last event read from 'mariadb-bin.000074' at 391473581, the last byte read from 'mariadb-bin.000074' at 391473600.'
                     Last_SQL_Errno: 0
                     Last_SQL_Error: 
        Replicate_Ignore_Server_Ids: 
                   Master_Server_Id: 516410943
                     Master_SSL_Crl: 
                 Master_SSL_Crlpath: 
                         Using_Gtid: No
                        Gtid_IO_Pos: 
            Replicate_Do_Domain_Ids: 
        Replicate_Ignore_Domain_Ids: 
                      Parallel_Mode: conservative
                          SQL_Delay: 0
                SQL_Remaining_Delay: NULL
            Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
      1 row in set (0.00 sec)
      

      As you can see we have file, and position, so i searched in binarylog to see what happen, but it was impossible for me to find this position as you can see :

      MariaDB [(none)]> SHOW BINLOG EVENTS IN 'mariadb-bin.000074' limit 4615070,20;
      +--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
      | Log_name           | Pos       | Event_type     | Server_id | End_log_pos | Info                                          |
      +--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
      | mariadb-bin.000074 | 391472813 | Table_map      |   3007010 |   391472872 | table_id: 2856 (mall.intuition_sites)         |
      | mariadb-bin.000074 | 391472872 | Write_rows_v1  |   3007010 |   391472914 | table_id: 2856 flags: STMT_END_F              |
      | mariadb-bin.000074 | 391472914 | Xid            |   3007010 |   391472945 | COMMIT /* xid=470986202 */                    |
      | mariadb-bin.000074 | 391472945 | Gtid           |   3007010 |   391472987 | BEGIN GTID 0-3007010-1620164073               |
      | mariadb-bin.000074 | 391472987 | Table_map      |   3007010 |   391473114 | table_id: 2713 (mall.offers_scores)           |
      | mariadb-bin.000074 | 391473114 | Update_rows_v1 |   3007010 |   391473482 | table_id: 2713 flags: STMT_END_F              |
      | mariadb-bin.000074 | 391473482 | Xid            |   3007010 |   391473513 | COMMIT /* xid=470986204 */                    |
      | mariadb-bin.000074 | 391473513 | Gtid           |   3007010 |   391473555 | BEGIN GTID 0-3007010-1620164074               |
      | mariadb-bin.000074 | 391473555 | Table_map      |   3007010 |   391473688 | table_id: 2857 (mall.intuition)               |
      | mariadb-bin.000074 | 391473688 | Write_rows_v1  |   3007010 |   391474639 | table_id: 2857 flags: STMT_END_F              |
      | mariadb-bin.000074 | 391474639 | Xid            |   3007010 |   391474670 | COMMIT /* xid=470986206 */                    |
      | mariadb-bin.000074 | 391474670 | Gtid           |   3007010 |   391474712 | BEGIN GTID 0-3007010-1620164075               |
      | mariadb-bin.000074 | 391474712 | Table_map      |   3007010 |   391474797 | table_id: 2680 (mall.offers_prices_finals_v2) |
      | mariadb-bin.000074 | 391474797 | Update_rows_v1 |   3007010 |   391474913 | table_id: 2680 flags: STMT_END_F              |
      | mariadb-bin.000074 | 391474913 | Xid            |   3007010 |   391474944 | COMMIT /* xid=470986208 */                    |
      | mariadb-bin.000074 | 391474944 | Gtid           |   3007010 |   391474986 | BEGIN GTID 0-3007010-1620164076               |
      | mariadb-bin.000074 | 391474986 | Table_map      |   3007010 |   391475063 | table_id: 2669 (mall.offer_flags)             |
      | mariadb-bin.000074 | 391475063 | Update_rows_v1 |   3007010 |   391475169 | table_id: 2669 flags: STMT_END_F              |
      | mariadb-bin.000074 | 391475169 | Xid            |   3007010 |   391475200 | COMMIT /* xid=470986210 */                    |
      | mariadb-bin.000074 | 391475200 | Gtid           |   3007010 |   391475242 | BEGIN GTID 0-3007010-1620164077               |
      +--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
      20 rows in set (1.57 sec)
      

      After reading my log on PmaControl I saw :

      Date	exec_master_log_pos	last_io_errno	last_io_error	last_sql_errno	last_sql_error	master_log_file	wsrep_cluster_size	wsrep_cluster_status	wsrep_local_state_comment	wsrep_ready
      2018-07-26 04:01:54	390831285	0		0		mariadb-bin.000071	6	Primary	Synced	ON
      2018-07-26 04:02:06	390973194	0		0		mariadb-bin.000071	6	Primary	Synced	ON
      2018-07-26 04:02:19	391438449	0		0		mariadb-bin.000071	6	Primary	Synced	ON
      2018-07-26 04:02:31	391473581	0		1047	Node has dropped from cluster	mariadb-bin.000071	6	Primary	Synced	ON
      2018-07-26 04:02:44	391473581	0		1047	Node has dropped from cluster	mariadb-bin.000071	6	Primary	Synced	ON
      

      as you can see the problem on mariadb-bin.000071 at 2018-07-26 04:02:31, my question is why show salve status show us Master_Log_File: mariadb-bin.000074, it's should be stop on mariadb-bin.000071 no ?

      that was my first question, second one when you restarted slave it's start from :

      mariadb-bin.000074 : 391473581 (may i put it manually), ofc i got an error but this msg is kinda crazy, it's should say that position does not exist :

       
      2018-07-26 13:21:29 140396387120896 [Note] Start binlog_dump to slave_server(1740027789), pos(mariadb-bin.000074, 391473581)
      2018-07-26 14:50:31 140396388636416 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473581)
      2018-07-26 19:09:43 140396394395392 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473581)
      2018-07-26 19:21:36 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1769305198, event_type: 108
      2018-07-26 19:22:35 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1863385196, event_type: 0
      2018-07-26 19:22:56 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 16777217, event_type: 15
      2018-07-26 20:09:28 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 145
      2018-07-26 20:10:00 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 3299606784, event_type: 134
      2018-07-26 20:21:04 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 19
      2018-07-26 20:22:04 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 4244377359, event_type: 15
      2018-07-26 20:22:26 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 8388608, event_type: 0
      2018-07-26 20:22:29 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 32768, event_type: 0
      2018-07-26 20:22:32 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 128, event_type: 128
      2018-07-26 20:22:34 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 0
      2018-07-26 20:22:37 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1526726656, event_type: 0
      2018-07-26 20:22:39 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1499136000, event_type: 0
      2018-07-26 20:22:41 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1465473792, event_type: 0
      2018-07-26 20:22:43 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1683446107, event_type: 128
      2018-07-26 20:22:46 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1617188697, event_type: 0
      2018-07-26 20:22:54 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1617188697, event_type: 0
      2018-07-26 20:23:16 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 291529815, event_type: 0
      2018-07-26 20:23:54 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 2113929216, event_type: 34
      2018-07-26 20:23:59 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 3758096384, event_type: 0
      2018-07-26 21:43:46 140396391061248 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473482)
      

      Ok now i know where to look :

      MariaDB [(none)]> SHOW BINLOG EVENTS IN 'mariadb-bin.000071' from 391473581 limit 0,20;
      +--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
      | Log_name           | Pos       | Event_type     | Server_id | End_log_pos | Info                                                                             |
      +--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
      | mariadb-bin.000071 | 391473581 | Gtid           |   3007010 |   391473623 | BEGIN GTID 0-3007010-1611553616                                                  |
      | mariadb-bin.000071 | 391473623 | Query          |   3007010 |   391473775 | DROP TEMPORARY TABLE IF EXISTS `mall`.`_replaced_unit` /* generated by server */ |
      | mariadb-bin.000071 | 391473775 | Query          |   3007010 |   391473853 | COMMIT                                                                           |
      | mariadb-bin.000071 | 391473853 | Gtid           |   3007010 |   391473895 | BEGIN GTID 0-3007010-1611553617                                                  |
      | mariadb-bin.000071 | 391473895 | Table_map      |   3007010 |   391473964 | table_id: 2495 (mall.intuition_indexation)                                       |
      | mariadb-bin.000071 | 391473964 | Update_rows_v1 |   3007010 |   391474022 | table_id: 2495 flags: STMT_END_F                                                 |
      | mariadb-bin.000071 | 391474022 | Xid            |   3007010 |   391474053 | COMMIT /* xid=452403272 */                                                       |
      | mariadb-bin.000071 | 391474053 | Gtid           |   3007010 |   391474095 | BEGIN GTID 0-3007010-1611553618                                                  |
      | mariadb-bin.000071 | 391474095 | Table_map      |   3007010 |   391474162 | table_id: 2474 (mall.offers_monitor)                                             |
      | mariadb-bin.000071 | 391474162 | Update_rows_v1 |   3007010 |   391474254 | table_id: 2474 flags: STMT_END_F                                                 |
      | mariadb-bin.000071 | 391474254 | Xid            |   3007010 |   391474285 | COMMIT /* xid=452403274 */                                                       |
      | mariadb-bin.000071 | 391474285 | Gtid           |   3007010 |   391474327 | BEGIN GTID 0-3007010-1611553619                                                  |
      | mariadb-bin.000071 | 391474327 | Table_map      |   3007010 |   391474386 | table_id: 2479 (mall.intuition_sites)                                            |
      | mariadb-bin.000071 | 391474386 | Write_rows_v1  |   3007010 |   391474428 | table_id: 2479 flags: STMT_END_F                                                 |
      | mariadb-bin.000071 | 391474428 | Xid            |   3007010 |   391474459 | COMMIT /* xid=452403276 */                                                       |
      | mariadb-bin.000071 | 391474459 | Gtid           |   3007010 |   391474501 | BEGIN GTID 0-3007010-1611553620                                                  |
      | mariadb-bin.000071 | 391474501 | Table_map      |   3007010 |   391474628 | table_id: 2472 (mall.offers_scores)                                              |
      | mariadb-bin.000071 | 391474628 | Update_rows_v1 |   3007010 |   391475004 | table_id: 2472 flags: STMT_END_F                                                 |
      | mariadb-bin.000071 | 391475004 | Xid            |   3007010 |   391475035 | COMMIT /* xid=452403278 */                                                       |
      | mariadb-bin.000071 | 391475035 | Gtid           |   3007010 |   391475077 | BEGIN GTID 0-3007010-1611553621                                                  |
      +--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
      20 rows in set (0.00 sec)
      

      and in log :

      2018-07-26  4:02:21 140018283419392 [Warning] WSREP: SQL statement was ineffective  thd: 1359262  buf: 190
      schema: mall
      QUERY: COMMIT
       => Skipping replication
      2018-07-26  4:02:21 140018283419392 [Note] WSREP: cluster conflict due to certification failure for threads:
      2018-07-26  4:02:21 140018283419392 [Note] WSREP: Victim thread:
         THD: 1359262, mode: local, state: executing, conflict: cert failure, seqno: -1
         SQL: COMMIT
      2018-07-26  4:02:21 140018283419392 [ERROR] Slave SQL: Error 'Got error 35 "Resource deadlock avoided" during COMMIT' on query. Default database: 'mall'. Query: 'COMMIT', Gtid 0-3007010-1611553616, Internal MariaDB error code: 1180
      2018-07-26  4:02:21 140018283419392 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 0-3007010-1611553616, Internal MariaDB error code: 1047
      2018-07-26  4:02:21 140018283419392 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000071' at position 391473581
      

      My question is how to make a deadlock from replication and galeracluster, ofc course it's should be only user RO on galera cluster but it's was not.

      I was kinda surprise to see a deadlock between the thread of replication and the server, and more about droping a TEMPORARY table :

      DROP TEMPORARY TABLE IF EXISTS `mall`.`_replaced_unit` /* generated by server */
      

      It's linked with Galera cluster, because when it's was with server without Galera it's was working perfectly. (ok it's was not clean)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Elkin Andrei Elkin
                Reporter:
                Aurelien_LEQUOY Aurélien LEQUOY
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: