[MDEV-6333] A deadlock occured on Galera Clustering Created: 2014-06-12  Updated: 2014-12-23  Resolved: 2014-12-23

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 5.5.37-galera
Fix Version/s: 5.5.41-galera

Type: Bug Priority: Major
Reporter: shin Assignee: Nirbhay Choubey (Inactive)
Resolution: Incomplete Votes: 0
Labels: galera


 Description   

Our System is using Galera Cluster.

Error formatting macro: code: java.lang.StackOverflowError

MariaDB [test]> show variables like ''%wsrep%';

Variable_name Value
wsrep_OSU_method TOI
wsrep_auto_increment_control ON
wsrep_causal_reads OFF
wsrep_certify_nonPK ON
wsrep_cluster_address gcomm://xx.xxx.xx.x1,xx.xxx.xx.x2,xx.xxx.xx.x3
wsrep_cluster_name GC
wsrep_convert_LOCK_to_trx OFF
wsrep_data_home_dir /var/lib/mysql/
wsrep_dbug_option  
wsrep_debug OFF
wsrep_desync OFF
wsrep_drupal_282555_workaround OFF
wsrep_forced_binlog_format NONE
wsrep_load_data_splitting ON
wsrep_log_conflicts OFF
wsrep_max_ws_rows 131072
wsrep_max_ws_size 1073741824
wsrep_mysql_replication_bundle 0
wsrep_node_address xx.xxx.xx.x2
wsrep_node_incoming_address AUTO
wsrep_node_name GC-1
wsrep_notify_cmd  
wsrep_on ON
wsrep_provider /usr/lib64/galera/libgalera_smm.so
wsrep_provider_options base_host = xx.xxx.xx.x2; base_port = 4567; cert.log_conflicts = no; debug = no; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 1; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = P1D; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment = 0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = xx.xxx.xx.x2; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = P30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 5; socket.checksum = 2;
wsrep_recover OFF
wsrep_replicate_myisam OFF
wsrep_restart_slave OFF
wsrep_retry_autocommit 1
wsrep_slave_threads 1
wsrep_sst_auth  
wsrep_sst_donor  
wsrep_sst_donor_rejects_queries OFF
wsrep_sst_method rsync
wsrep_sst_receive_address AUTO
wsrep_start_position 8e663ba7-f123-11e3-88dc-dfe448d1c69c:1339436

And our DB nodes' auto_increment settings are
Node #1

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 1     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+

Node #2

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 2     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+

Node #3

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 3     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+

This setting is same as with the post on blog.mariadb.org ( https://blog.mariadb.org/auto-increments-in-galera/ ).
But In our System. While doing update logic in Transaction Deadlock is still occured.
Our System is consisted of 2 Agent(Active-Active) ,3 DB nodes(Galera Cluster).
If Agent 1&2 are connected to only 1 node (ex DB node 1), Deadlock is not occured.
But if I consist the connection - Agent 1 to DB node 1 & Agent 2 to DB node 2 each, Deadlock is occured.
What is the problem and How can I solve this Deadlock?.



 Comments   
Comment by Nirbhay Choubey (Inactive) [ 2014-06-12 ]

shin
It looks like auto_increment_offset values are same for Node#1 and Node#3.
This might cause a deadlock. Were they set manually?

Comment by shin [ 2014-06-16 ]

@Nirbhay Choubey
Sorry, It's my mistake.
The auto_increment_offset value of Node#3 is '3', not '1'.
Actually, the Nodes have different values each other.

Comment by Daniel Black [ 2014-10-30 ]

shin, deadlocks are usually an application problem updating the same rows concurrently.

https://dev.mysql.com/doc/refman/5.6/en/innodb-deadlocks.html

With galera deadlocks will occur on commit instead of earlier when using a single instance. Perhaps your application isn't looking for a deadlock on commit?

If you still think this is a bug can you provide table structures from 'show create table x' and the SQL update statements done by the applications (out of binlogs with annotations turned on if a close source application).

Comment by Nirbhay Choubey (Inactive) [ 2014-12-23 ]

shin, As danblack suggested, if you still think the issue is related to server, then kindly reopen it and provide us with table structure (SHOW CREATE TABLE <table-name>) and the queries that causes deadlock.

Generated at Thu Feb 08 07:11:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.