Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21189

Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang

    XMLWordPrintable

    Details

      Description

      Suppose there is a 3 node mariadb galera cluster and an async replicated node connected to it. The galera cluster has a partitioned table that you wish to keep only for a few weeks, while the slave has several months of data in it. Once in a while you have to drop the partitions in the galera cluster without affecting the async slave.

      On the load-balancer (haproxy): disable traffic to galera1 node.

      galera1 node)
      SET GLOBAL wsrep_OSU_method='RSU';
      SET SESSION sql_log_bin = 0;

      ALTER TABLE db.table DROP PARTITION 201909;
      ALTER TABLE db.table DROP PARTITION 201910;
      ALTER TABLE db.table DROP PARTITION 201911;

      And so on for all galera nodes. This way the galera nodes will have the 201909-201911 partitions dropped, while the slave will have the table as before.

      In some cases the DROP PARTITION statement causes the mariadb process to hang, the operation doesn't end and a new connection cannot be made to the server. Server version 10.2.27. There is nothing in the error log.

      Galera nodes have the following configuration:
      ignore_db_dirs = lost+found
      innodb_buffer_pool_size = 120G
      innodb_flush_log_at_trx_commit = 1
      max_allowed_packet = 60M
      table_open_cache = 800

      wsrep_on=ON
      binlog_format=ROW
      default_storage_engine=innodb
      innodb_autoinc_lock_mode=2
      query_cache_size=0
      wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      wsrep_provider_options="gcache.size=20G; gcache.page_size=20G; gcs.fc_limit=256; gcs.fc_factor=0.99; gcs.fc_master_slave=YES"
      wsrep-node-name=node1
      wsrep_node_address=1.1.1.1
      wsrep_cluster_name=cluster
      wsrep_cluster_address="gcomm://1.1.1.1,1.1.1.2,1.1.1.3"
      wsrep_sst_method=mariabackup
      wsrep_sst_auth=user:pass
      wsrep_slave_threads=24

      binlog_checksum = CRC32
      innodb_checksum_algorithm = crc32
      innodb_strict_mode = 0
      log_slave_updates = 1
      master_verify_checksum = 1
      max_connect_errors = 500
      max_connections = 1500
      max_heap_table_size = 1G
      slave_parallel_threads = 20
      slave_sql_verify_checksum = 1
      sql_mode = "NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
      sync_binlog = 1
      tmp_table_size = 1G

      This also happened in version 10.1, but it does not happen all the time. If it happens, only sending kill -9 to mariadb works to stop it.

        Attachments

        1. table.sql
          2 kB
        2. stack.txt
          106 kB
        3. mysqld.err
          24 kB
        4. galera_partition.test
          24 kB
        5. galera_partition.cnf
          0.4 kB

          Activity

            People

            Assignee:
            seppo Seppo Jaakola
            Reporter:
            hydrapolic Tomas Mozes
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: