[MDEV-10518] Large wsrep_gtid_domain_id may break IST Created: 2016-08-08  Updated: 2016-08-25  Resolved: 2016-08-25

Status: Closed
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.1.16
Fix Version/s: 10.1.17

Type: Bug Priority: Major
Reporter: Andrew Garner Assignee: Nirbhay Choubey (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

A test environment was generating server_id / wsrep_gtid_domain_id based on certain unsigned 32-bit hash values. This sometimes resulted in a large unsigned value:

MariaDB [(none)]> select @@wsrep_gtid_domain_id;
+------------------------+
| @@wsrep_gtid_domain_id |
+------------------------+
|             3707887008 |
+------------------------+
1 row in set (0.00 sec)

When restarting a node and IST was initiated, the wsrep-sst-method=xtrabackup-v2 script was started with a negative gtid-domain-id as seen in the donor's logs:

[Note] WSREP: IST request: c0bdbd55-5b47-11e6-b904-d2ce4687a92f:9588-9788|tcp://10.0.0.1:4568
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.0.0.1:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/va
r/lib/mysql/'    --binlog '/var/log/mysql/mysql-bin' --gtid 'c0bdbd55-5b47-11e6-b904-d2ce4687a92f:9588' --gtid-domain-id '-587080288' --bypass'

And this resulted in a failure on the joiner:

WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20160808 18:09:44.535)
WSREP_SST: [INFO] Galera co-ords from recovery: c0bdbd55-5b47-11e6-b904-d2ce4687a92f:9588 -587080288 (20160808 18:09:44.538)
WSREP_SST: [INFO] Total time on joiner: 0 seconds (20160808 18:09:44.540)
WSREP_SST: [INFO] Removing the sst_in_progress file (20160808 18:09:44.542)
[ERROR] WSREP: Failed to get donor wsrep_gtid_domain_id.
[ERROR] WSREP: SST failed: 22 (Invalid argument)
[ERROR] Aborting
mysqld[779]: Error in my_thread_global_end(): 1 threads didn't exit

As a workaround, the wsrep_gtid_domain_id was clamped to range of a signed 32-bit integer and this problem was avoided. However, I think this should probably work.



 Comments   
Comment by Andrew Garner [ 2016-08-08 ]

I see I was a bit off in my analysis - clamping to a positive signed 32-bit integer isn't sufficient - I just got lucky when I adjusted the value. I see where this error stems from and the value of wsrep-gtid-domain-id is restricted to the range of 16-bit integer here:

https://github.com/MariaDB/server/blob/3fd214c8be7c2340ebe06f4c887c67f5c928e5f0/sql/wsrep_sst.cc#L498

So a workaround requires keeping this value within a bit smaller range than I originally suggested.

Comment by Nirbhay Choubey (Inactive) [ 2016-08-23 ]

andrew.garner Thanks for pointing it out!

Comment by Nirbhay Choubey (Inactive) [ 2016-08-23 ]

http://lists.askmonty.org/pipermail/commits/2016-August/009684.html

Generated at Thu Feb 08 07:42:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.