Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10518

Large wsrep_gtid_domain_id may break IST

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.1.16
    • Fix Version/s: 10.1.17
    • Component/s: Galera SST
    • Labels:
      None

      Description

      A test environment was generating server_id / wsrep_gtid_domain_id based on certain unsigned 32-bit hash values. This sometimes resulted in a large unsigned value:

      MariaDB [(none)]> select @@wsrep_gtid_domain_id;
      +------------------------+
      | @@wsrep_gtid_domain_id |
      +------------------------+
      |             3707887008 |
      +------------------------+
      1 row in set (0.00 sec)
      

      When restarting a node and IST was initiated, the wsrep-sst-method=xtrabackup-v2 script was started with a negative gtid-domain-id as seen in the donor's logs:

      [Note] WSREP: IST request: c0bdbd55-5b47-11e6-b904-d2ce4687a92f:9588-9788|tcp://10.0.0.1:4568
      [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.0.0.1:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/va
      r/lib/mysql/'    --binlog '/var/log/mysql/mysql-bin' --gtid 'c0bdbd55-5b47-11e6-b904-d2ce4687a92f:9588' --gtid-domain-id '-587080288' --bypass'
      

      And this resulted in a failure on the joiner:

      WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20160808 18:09:44.535)
      WSREP_SST: [INFO] Galera co-ords from recovery: c0bdbd55-5b47-11e6-b904-d2ce4687a92f:9588 -587080288 (20160808 18:09:44.538)
      WSREP_SST: [INFO] Total time on joiner: 0 seconds (20160808 18:09:44.540)
      WSREP_SST: [INFO] Removing the sst_in_progress file (20160808 18:09:44.542)
      [ERROR] WSREP: Failed to get donor wsrep_gtid_domain_id.
      [ERROR] WSREP: SST failed: 22 (Invalid argument)
      [ERROR] Aborting
      mysqld[779]: Error in my_thread_global_end(): 1 threads didn't exit
      

      As a workaround, the wsrep_gtid_domain_id was clamped to range of a signed 32-bit integer and this problem was avoided. However, I think this should probably work.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              nirbhay_c Nirbhay Choubey (Inactive)
              Reporter:
              andrew.garner Andrew Garner
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: