Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
10.2.16
-
Debian 9
Description
Master : MariaDB 5.5.28 (ROW) => GaleraCluster 6 Node on MariaDB 10.2.16 (2 segments of 3 nodes + Arbitrator)
This config was set for make a migration from MariaDB 5.5.28 to MariaDB 10.2.16
This replication broken in night but i saw this on morning :
show slave status\G
|
*************************** 1. row *************************** |
Slave_IO_State:
|
Master_Host: 10.241.4.44 |
Master_User: replication
|
Master_Port: 3306 |
Connect_Retry: 60 |
Master_Log_File: mariadb-bin.000074 |
Read_Master_Log_Pos: 391473581 |
Relay_Log_File: mysqld-relay-bin.000002 |
Relay_Log_Pos: 557 |
Relay_Master_Log_File: mariadb-bin.000074 |
Slave_IO_Running: No
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0 |
Last_Error:
|
Skip_Counter: 0 |
Exec_Master_Log_Pos: 391473581 |
Relay_Log_Space: 867 |
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0 |
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: NULL
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 1236 |
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mariadb-bin.000074' at 391473581, the last event read from 'mariadb-bin.000074' at 391473581, the last byte read from 'mariadb-bin.000074' at 391473600.' |
Last_SQL_Errno: 0 |
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 516410943 |
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: No
|
Gtid_IO_Pos:
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: conservative
|
SQL_Delay: 0 |
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it |
1 row in set (0.00 sec) |
As you can see we have file, and position, so i searched in binarylog to see what happen, but it was impossible for me to find this position as you can see :
MariaDB [(none)]> SHOW BINLOG EVENTS IN 'mariadb-bin.000074' limit 4615070,20; |
+--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
|
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
|
+--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
|
| mariadb-bin.000074 | 391472813 | Table_map | 3007010 | 391472872 | table_id: 2856 (mall.intuition_sites) | |
| mariadb-bin.000074 | 391472872 | Write_rows_v1 | 3007010 | 391472914 | table_id: 2856 flags: STMT_END_F | |
| mariadb-bin.000074 | 391472914 | Xid | 3007010 | 391472945 | COMMIT /* xid=470986202 */ | |
| mariadb-bin.000074 | 391472945 | Gtid | 3007010 | 391472987 | BEGIN GTID 0-3007010-1620164073 | |
| mariadb-bin.000074 | 391472987 | Table_map | 3007010 | 391473114 | table_id: 2713 (mall.offers_scores) | |
| mariadb-bin.000074 | 391473114 | Update_rows_v1 | 3007010 | 391473482 | table_id: 2713 flags: STMT_END_F | |
| mariadb-bin.000074 | 391473482 | Xid | 3007010 | 391473513 | COMMIT /* xid=470986204 */ | |
| mariadb-bin.000074 | 391473513 | Gtid | 3007010 | 391473555 | BEGIN GTID 0-3007010-1620164074 | |
| mariadb-bin.000074 | 391473555 | Table_map | 3007010 | 391473688 | table_id: 2857 (mall.intuition) | |
| mariadb-bin.000074 | 391473688 | Write_rows_v1 | 3007010 | 391474639 | table_id: 2857 flags: STMT_END_F | |
| mariadb-bin.000074 | 391474639 | Xid | 3007010 | 391474670 | COMMIT /* xid=470986206 */ | |
| mariadb-bin.000074 | 391474670 | Gtid | 3007010 | 391474712 | BEGIN GTID 0-3007010-1620164075 | |
| mariadb-bin.000074 | 391474712 | Table_map | 3007010 | 391474797 | table_id: 2680 (mall.offers_prices_finals_v2) | |
| mariadb-bin.000074 | 391474797 | Update_rows_v1 | 3007010 | 391474913 | table_id: 2680 flags: STMT_END_F | |
| mariadb-bin.000074 | 391474913 | Xid | 3007010 | 391474944 | COMMIT /* xid=470986208 */ | |
| mariadb-bin.000074 | 391474944 | Gtid | 3007010 | 391474986 | BEGIN GTID 0-3007010-1620164076 | |
| mariadb-bin.000074 | 391474986 | Table_map | 3007010 | 391475063 | table_id: 2669 (mall.offer_flags) | |
| mariadb-bin.000074 | 391475063 | Update_rows_v1 | 3007010 | 391475169 | table_id: 2669 flags: STMT_END_F | |
| mariadb-bin.000074 | 391475169 | Xid | 3007010 | 391475200 | COMMIT /* xid=470986210 */ | |
| mariadb-bin.000074 | 391475200 | Gtid | 3007010 | 391475242 | BEGIN GTID 0-3007010-1620164077 | |
+--------------------+-----------+----------------+-----------+-------------+-----------------------------------------------+
|
20 rows in set (1.57 sec) |
After reading my log on PmaControl I saw :
Date exec_master_log_pos last_io_errno last_io_error last_sql_errno last_sql_error master_log_file wsrep_cluster_size wsrep_cluster_status wsrep_local_state_comment wsrep_ready
|
2018-07-26 04:01:54 390831285 0 0 mariadb-bin.000071 6 Primary Synced ON |
2018-07-26 04:02:06 390973194 0 0 mariadb-bin.000071 6 Primary Synced ON |
2018-07-26 04:02:19 391438449 0 0 mariadb-bin.000071 6 Primary Synced ON |
2018-07-26 04:02:31 391473581 0 1047 Node has dropped from cluster mariadb-bin.000071 6 Primary Synced ON |
2018-07-26 04:02:44 391473581 0 1047 Node has dropped from cluster mariadb-bin.000071 6 Primary Synced ON |
as you can see the problem on mariadb-bin.000071 at 2018-07-26 04:02:31, my question is why show salve status show us Master_Log_File: mariadb-bin.000074, it's should be stop on mariadb-bin.000071 no ?
that was my first question, second one when you restarted slave it's start from :
mariadb-bin.000074 : 391473581 (may i put it manually), ofc i got an error but this msg is kinda crazy, it's should say that position does not exist :
 |
2018-07-26 13:21:29 140396387120896 [Note] Start binlog_dump to slave_server(1740027789), pos(mariadb-bin.000074, 391473581) |
2018-07-26 14:50:31 140396388636416 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473581) |
2018-07-26 19:09:43 140396394395392 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473581) |
2018-07-26 19:21:36 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1769305198, event_type: 108 |
2018-07-26 19:22:35 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1863385196, event_type: 0 |
2018-07-26 19:22:56 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 16777217, event_type: 15 |
2018-07-26 20:09:28 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 145 |
2018-07-26 20:10:00 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 3299606784, event_type: 134 |
2018-07-26 20:21:04 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 19 |
2018-07-26 20:22:04 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 4244377359, event_type: 15 |
2018-07-26 20:22:26 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 8388608, event_type: 0 |
2018-07-26 20:22:29 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 32768, event_type: 0 |
2018-07-26 20:22:32 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 128, event_type: 128 |
2018-07-26 20:22:34 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 0 |
2018-07-26 20:22:37 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1526726656, event_type: 0 |
2018-07-26 20:22:39 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1499136000, event_type: 0 |
2018-07-26 20:22:41 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1465473792, event_type: 0 |
2018-07-26 20:22:43 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1683446107, event_type: 128 |
2018-07-26 20:22:46 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1617188697, event_type: 0 |
2018-07-26 20:22:54 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1617188697, event_type: 0 |
2018-07-26 20:23:16 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 291529815, event_type: 0 |
2018-07-26 20:23:54 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 2113929216, event_type: 34 |
2018-07-26 20:23:59 140396388333312 [ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 3758096384, event_type: 0 |
2018-07-26 21:43:46 140396391061248 [Note] Start binlog_dump to slave_server(169996294), pos(mariadb-bin.000074, 391473482) |
Ok now i know where to look :
MariaDB [(none)]> SHOW BINLOG EVENTS IN 'mariadb-bin.000071' from 391473581 limit 0,20; |
+--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
|
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
|
+--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
|
| mariadb-bin.000071 | 391473581 | Gtid | 3007010 | 391473623 | BEGIN GTID 0-3007010-1611553616 | |
| mariadb-bin.000071 | 391473623 | Query | 3007010 | 391473775 | DROP TEMPORARY TABLE IF EXISTS `mall`.`_replaced_unit` /* generated by server */ | |
| mariadb-bin.000071 | 391473775 | Query | 3007010 | 391473853 | COMMIT | |
| mariadb-bin.000071 | 391473853 | Gtid | 3007010 | 391473895 | BEGIN GTID 0-3007010-1611553617 | |
| mariadb-bin.000071 | 391473895 | Table_map | 3007010 | 391473964 | table_id: 2495 (mall.intuition_indexation) | |
| mariadb-bin.000071 | 391473964 | Update_rows_v1 | 3007010 | 391474022 | table_id: 2495 flags: STMT_END_F | |
| mariadb-bin.000071 | 391474022 | Xid | 3007010 | 391474053 | COMMIT /* xid=452403272 */ | |
| mariadb-bin.000071 | 391474053 | Gtid | 3007010 | 391474095 | BEGIN GTID 0-3007010-1611553618 | |
| mariadb-bin.000071 | 391474095 | Table_map | 3007010 | 391474162 | table_id: 2474 (mall.offers_monitor) | |
| mariadb-bin.000071 | 391474162 | Update_rows_v1 | 3007010 | 391474254 | table_id: 2474 flags: STMT_END_F | |
| mariadb-bin.000071 | 391474254 | Xid | 3007010 | 391474285 | COMMIT /* xid=452403274 */ | |
| mariadb-bin.000071 | 391474285 | Gtid | 3007010 | 391474327 | BEGIN GTID 0-3007010-1611553619 | |
| mariadb-bin.000071 | 391474327 | Table_map | 3007010 | 391474386 | table_id: 2479 (mall.intuition_sites) | |
| mariadb-bin.000071 | 391474386 | Write_rows_v1 | 3007010 | 391474428 | table_id: 2479 flags: STMT_END_F | |
| mariadb-bin.000071 | 391474428 | Xid | 3007010 | 391474459 | COMMIT /* xid=452403276 */ | |
| mariadb-bin.000071 | 391474459 | Gtid | 3007010 | 391474501 | BEGIN GTID 0-3007010-1611553620 | |
| mariadb-bin.000071 | 391474501 | Table_map | 3007010 | 391474628 | table_id: 2472 (mall.offers_scores) | |
| mariadb-bin.000071 | 391474628 | Update_rows_v1 | 3007010 | 391475004 | table_id: 2472 flags: STMT_END_F | |
| mariadb-bin.000071 | 391475004 | Xid | 3007010 | 391475035 | COMMIT /* xid=452403278 */ | |
| mariadb-bin.000071 | 391475035 | Gtid | 3007010 | 391475077 | BEGIN GTID 0-3007010-1611553621 | |
+--------------------+-----------+----------------+-----------+-------------+----------------------------------------------------------------------------------+
|
20 rows in set (0.00 sec) |
and in log :
2018-07-26 4:02:21 140018283419392 [Warning] WSREP: SQL statement was ineffective thd: 1359262 buf: 190 |
schema: mall
|
QUERY: COMMIT
|
=> Skipping replication
|
2018-07-26 4:02:21 140018283419392 [Note] WSREP: cluster conflict due to certification failure for threads: |
2018-07-26 4:02:21 140018283419392 [Note] WSREP: Victim thread: |
THD: 1359262, mode: local, state: executing, conflict: cert failure, seqno: -1 |
SQL: COMMIT
|
2018-07-26 4:02:21 140018283419392 [ERROR] Slave SQL: Error 'Got error 35 "Resource deadlock avoided" during COMMIT' on query. Default database: 'mall'. Query: 'COMMIT', Gtid 0-3007010-1611553616, Internal MariaDB error code: 1180 |
2018-07-26 4:02:21 140018283419392 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 0-3007010-1611553616, Internal MariaDB error code: 1047 |
2018-07-26 4:02:21 140018283419392 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000071' at position 391473581 |
My question is how to make a deadlock from replication and galeracluster, ofc course it's should be only user RO on galera cluster but it's was not.
I was kinda surprise to see a deadlock between the thread of replication and the server, and more about droping a TEMPORARY table :
DROP TEMPORARY TABLE IF EXISTS `mall`.`_replaced_unit` /* generated by server */ |
It's linked with Galera cluster, because when it's was with server without Galera it's was working perfectly. (ok it's was not clean)
Attachments
Issue Links
- relates to
-
MDEV-18641 "Resource deadlock avoided during COMMIT" when a node leaves the group gracefully
- Closed
-
MDEV-19572 Async slave on Galera node stops due to 1942 Error during COMMIT: failed to update GTID state in mysql.gtid_slave_pos: 0:
- Closed