Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.4.13
    • N/A
    • Galera
    • None
    • Linux 4.19.124-gentoo x86_64 AMD EPYC 7451, Intel I350 Gigabit Ethernet

    Description

      Note. Recent Galera version affected is 1.0.4/26.4.4, but same is relevant for previous versions, at least 26.4.3.

      We have cluster of 3 nodes working with large (above 1 TB) data volume. All the nodes have same hardware and software. Sometimes nodes run SST and IST to transfer data.
      It was notices that frequently SST fails and has to be restarted due to error:

      [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
      

      We tried to set bigger galera gcache size, but on some cases error happened again, on some it didn't. Moreover, sometimes simple restart of mysqld on receiving node (and thus restarting SST when donor node returned to synced state back) lead to successful SST and joiner managed to join the cluster, but sometimes it failed.

      • It was noticed that gcache size and amount of transactions happening on cluster nodes has no effect on the issue.
      • disabling or enabling compression of state transfer data and also attempts to flush logs has had no effect also
      • It was also noticed that in case IST failed, it was always possible to find same error message logged at 20(+/- 1) minutes after starting mysqld on joining node (thus, 20 minutes after state transfer request). This error was:

      2020-06-01 21:50:42 0 [Note] WSREP: IST sender 232217729 -> 232234231
      ...
      WSREP_SST: [INFO] Evaluating /usr/bin/mariabackup --innobackupex --defaults-file=/etc/mysql/my.cnf     $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2> /var/lib/mysql//mariabackup.backup.log | /usr/bin/zstd --fast=3 | socat -u stdio TCP:***.*.***.*:4444; RC=( ${PIPESTATUS[@]} ) (20200601 21:50:53.977)
      2020-06-01 22:10:59 0 [ERROR] WSREP: async IST sender failed to serve tcp://***.*.***.*:4568: ist send failed: asio.system:110', asio error 'write: Connection timed out': 110 (Connection timed out)
           at galera/src/ist.cpp:send():887
      2020-06-01 22:10:59 0 [Note] WSREP: async IST sender served
      

      Appearance of these last two lines (error+note) in mysqld log file always ended with state transfer failed with following errors logged:

      2020-06-02  1:49:58 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:232217728, protocol version: 5
      2020-06-02  1:49:58 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104
      2020-06-02  1:49:58 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 232234231 last received: 232221423
      2020-06-02  1:49:58 2 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
           at galera/src/replicator_smm.hpp:pop_front():314. Null event.
      

      So, questionable things are:

      1. How to avoid such situations - nodes require manual restart on failed transfers!
      2. Why this asio error is always logged 20 minutes after state transfer start?
      3. Reported failure is 'Connection timed out' while connection is stable and no service or monitoring tool reports connection issues
      4. Issue is floating: on some restarts it appears and on others it doesn't, this was actual for previous version on galera library, too. No configuration change seems to cause or solve this.
      5. Also it was noted that referred asio library that is used by galera is 1.10.8 and this version can't be changed - however version 1.18 is out already.

      Attachments

        Issue Links

          Activity

            rgpublic Ranjan Ghosh added a comment -

            Also happens for me very often when I just try to start a node. I start the first node. Works. DB is running. Then I try to start the second node (with a completely empty /var/lib/mysql folder). mariabackup sst starts. But on the second node I find this in the logs:

            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 3 [Note] WSREP: SST received: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Joiner monitor thread ended with total time 482 sec
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Installed new state from SST: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preload up to 2896595604
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST applying starts with 2896595605
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST current seqno initialized to 2896576828
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Receiving IST...  0.0% ( 0/37 events) complete.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: IST preload starting at 2896576828
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Service thread queue flushed.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:2896576827, protocol version: 5
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
            Jul 21 18:43:00 yak2 mariadbd[778444]:          at galera/src/replicator_smm.hpp:pop_front():315. Null event.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing send monitor...
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closed send monitor.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: terminating thread
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: joining thread
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closing backend
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: PC protocol downgrade 1 -> 0
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: view((empty))
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closed
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: New SELF-LEAVE.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Flow-control interval: [0, 0]
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2896595604)
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: RECV thread exiting 0: Success
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: recv_thread() joined.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing replication queue.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing slave action queue.
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preloaded up to 2896576840
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: gcs/src/gcs.cpp:s_join():929: Sending JOIN failed: -103 (Software caused connection abort).
            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Warning] WSREP: Failed to JOIN the cluster after SST gcs_join(a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604) failed: 103 (Software caused connection abort)
            Jul 21 18:43:00 yak2 mariadbd[778444]:          at galera/src/galera_gcs.hpp:join():218
            

            After that, the cluster is in state "INCONSISTENT" and doesn't work. Neither on the first nor on the second node. I usually resolve this by killing the mariadbd on both nodes, then switching back to sst_method: rsync.

            rgpublic Ranjan Ghosh added a comment - Also happens for me very often when I just try to start a node. I start the first node. Works. DB is running. Then I try to start the second node (with a completely empty /var/lib/mysql folder). mariabackup sst starts. But on the second node I find this in the logs: Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 3 [Note] WSREP: SST received: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Joiner monitor thread ended with total time 482 sec Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Installed new state from SST: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preload up to 2896595604 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST applying starts with 2896595605 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST current seqno initialized to 2896576828 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Receiving IST... 0.0% ( 0/37 events) complete. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: IST preload starting at 2896576828 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Service thread queue flushed. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:2896576827, protocol version: 5 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error) Jul 21 18:43:00 yak2 mariadbd[778444]: at galera/src/replicator_smm.hpp:pop_front():315. Null event. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing send monitor... Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closed send monitor. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: terminating thread Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: joining thread Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closing backend Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: PC protocol downgrade 1 -> 0 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: view((empty)) Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closed Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: New SELF-LEAVE. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Flow-control interval: [0, 0] Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Received SELF-LEAVE. Closing connection. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2896595604) Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: RECV thread exiting 0: Success Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: recv_thread() joined. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing replication queue. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing slave action queue. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preloaded up to 2896576840 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: gcs/src/gcs.cpp:s_join():929: Sending JOIN failed: -103 (Software caused connection abort). Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Warning] WSREP: Failed to JOIN the cluster after SST gcs_join(a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604) failed: 103 (Software caused connection abort) Jul 21 18:43:00 yak2 mariadbd[778444]: at galera/src/galera_gcs.hpp:join():218 After that, the cluster is in state "INCONSISTENT" and doesn't work. Neither on the first nor on the second node. I usually resolve this by killing the mariadbd on both nodes, then switching back to sst_method: rsync.
            rgpublic Ranjan Ghosh added a comment -

            PS: On MDEV-22797 I read some talk about timeouts. Perhaps this is useful: We have 2 technically completely identical clusters (exact same hardware, OS, setup etc). On one cluster this problem never happens. On the other cluster, this happens very frequently. The only difference is: The database of the problematic cluster (where the log excerpt I just posted is from) is much larger (lots more schemas and tables). Therefore, the SST is running way longer (a few minutes). So at least I find it not unlikely that this has sth. todo with some timeout or timing problem in general. That might be why it doesn't surface for everyone.

            rgpublic Ranjan Ghosh added a comment - PS: On MDEV-22797 I read some talk about timeouts. Perhaps this is useful: We have 2 technically completely identical clusters (exact same hardware, OS, setup etc). On one cluster this problem never happens. On the other cluster, this happens very frequently. The only difference is: The database of the problematic cluster (where the log excerpt I just posted is from) is much larger (lots more schemas and tables). Therefore, the SST is running way longer (a few minutes). So at least I find it not unlikely that this has sth. todo with some timeout or timing problem in general. That might be why it doesn't surface for everyone.
            Yurchenko Alexey added a comment -

            Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840
            

            What does donor log say about that?

            Yurchenko Alexey added a comment - Jul 21 18 : 43 : 00 yak2 mariadbd[ 778444 ]: 2020 - 07 - 21 18 : 43 : 00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840 What does donor log say about that?
            rgpublic Ranjan Ghosh added a comment -

            @Alexey. I just looked it up in the logs. First there's a long list of lines like these repeating for all schemas and tables:

            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./schemaxyz/drupal_anyredirect_group.frm to <STDOUT>
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done

            And then, finally:

            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Finished backing up non-InnoDB tables and files
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log.00000001 to <STDOUT>
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37         ...done
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT>
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37         ...done
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549'
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: 
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT>
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/'
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT>
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37         ...done
            Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549'
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: 
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT>
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/'
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done
            Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Redo log (from LSN 18807858713703 to 18807917230561) was copied.
            Jul 21 18:42:39 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:39 completed OK!
            Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Note] WSREP: SST sent: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896576865
            Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Warning] WSREP: server: yak1 unallowed state transition: joined -> joined
            Jul 21 18:42:39 yak1 mariadbd[132692]: mariadbd: /home/buildbot/buildbot/build/mariadb-10.5.4/wsrep-lib/src/server_state.cpp:1356: void wsrep::server_state::state(wsrep:>
            Jul 21 18:42:39 yak1 mariadbd[132692]: 200721 18:42:39 [ERROR] mysqld got signal 6 ;
            Jul 21 18:42:39 yak1 mariadbd[132692]: This could be because you hit a bug. It is also possible that this binary
            Jul 21 18:42:39 yak1 mariadbd[132692]: or one of the libraries it was linked against is corrupt, improperly built,
            Jul 21 18:42:39 yak1 mariadbd[132692]: or misconfigured. This error can also be caused by malfunctioning hardware.
            Jul 21 18:42:39 yak1 mariadbd[132692]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
            Jul 21 18:42:39 yak1 mariadbd[132692]: We will try our best to scrape up some info that will hopefully help
            Jul 21 18:42:39 yak1 mariadbd[132692]: diagnose the problem, but since we have already crashed,
            Jul 21 18:42:39 yak1 mariadbd[132692]: something is definitely wrong and this may fail.
            Jul 21 18:42:39 yak1 mariadbd[132692]: Server version: 10.5.4-MariaDB-1:10.5.4+maria~focal
            Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size=134217728
            Jul 21 18:42:39 yak1 mariadbd[132692]: read_buffer_size=131072
            Jul 21 18:42:39 yak1 mariadbd[132692]: max_used_connections=21
            Jul 21 18:42:39 yak1 mariadbd[132692]: max_threads=65537
            Jul 21 18:42:39 yak1 mariadbd[132692]: thread_count=6
            Jul 21 18:42:39 yak1 mariadbd[132692]: It is possible that mysqld could use up to
            Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 142752116 K  bytes of memory
            Jul 21 18:42:39 yak1 mariadbd[132692]: Hope that's ok; if not, decrease some variables in the equation.
            Jul 21 18:42:39 yak1 mariadbd[132692]: Thread pointer: 0x7f5a90000c58
            Jul 21 18:42:39 yak1 mariadbd[132692]: Attempting backtrace. You can use the following information to find out
            Jul 21 18:42:39 yak1 mariadbd[132692]: where mysqld died. If you see no messages after this, something went
            Jul 21 18:42:39 yak1 mariadbd[132692]: terribly wrong...
            Jul 21 18:42:39 yak1 mariadbd[132692]: stack_bottom = 0x7f5ab5ffaaa8 thread_stack 0x49000
            Jul 21 18:42:39 yak1 -wsrep-sst-donor[177111]: Total time on donor: 0 seconds
            Jul 21 18:42:39 yak1 -wsrep-sst-donor[177113]: Cleaning up temporary directories
            Jul 21 18:42:39 yak1 mariadbd[132692]: /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5559d678d9c2]
            Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(my_print_stacktrace)[0x5559d61db495]
            Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f641ef1d3c0]
            Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f641ea2418b]
            Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(gsignal)[0x7f641ea03859]
            Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(+0x25729)[0x7f641ea03729]
            Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(__assert_fail)[0x7f641ea14f36]
            Jul 21 18:42:40 yak1 mariadbd[132692]: /usr/sbin/mariadbd(_ZN5wsrep12server_state5stateERNS_11unique_lockINS_5mutexEEENS0_5stateE+0x657)[0x5559d6810397]
            Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::state(wsrep::unique_lock<wsrep::mutex>&, wsrep::server_state::state))[0x5559d6810b5f]
            Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::sst_sent(wsrep::gtid const&, int))[0x5559d64a837c]
            Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep_sst_donate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, wsrep::gtid const&, >
            Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f641ef11609]
            Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f641eb00103]
            Jul 21 18:42:40 yak1 mariadbd[132692]: Trying to get some variables.
            Jul 21 18:42:40 yak1 mariadbd[132692]: Some pointers may be invalid and cause the dump to abort.
            Jul 21 18:42:40 yak1 mariadbd[132692]: Query (0x0):
            Jul 21 18:42:40 yak1 mariadbd[132692]: Connection ID (thread ID): 0
            Jul 21 18:42:40 yak1 mariadbd[132692]: Status: NOT_KILLED
            Jul 21 18:42:40 yak1 mariadbd[132692]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_inter>
            Jul 21 18:42:40 yak1 mariadbd[132692]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
            Jul 21 18:42:40 yak1 mariadbd[132692]: information that should help you find out what is causing the crash.
            Jul 21 18:42:40 yak1 mariadbd[132692]: Writing a core file...
            Jul 21 18:42:40 yak1 mariadbd[132692]: Working directory at /var/lib/mysql
            Jul 21 18:42:40 yak1 mariadbd[132692]: Resource Limits:
            Jul 21 18:42:40 yak1 mariadbd[132692]: Limit                     Soft Limit           Hard Limit           Units
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max cpu time              unlimited            unlimited            seconds
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max file size             unlimited            unlimited            bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max data size             unlimited            unlimited            bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max stack size            8388608              unlimited            bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max core file size        0                    unlimited            bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max resident set          unlimited            unlimited            bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max processes             513841               513841               processes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max open files            16364                16364                files
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max locked memory         65536                65536                bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max address space         unlimited            unlimited            bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max file locks            unlimited            unlimited            locks
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max pending signals       513841               513841               signals
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max msgqueue size         819200               819200               bytes
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max nice priority         0                    0
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime priority     0                    0
            Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime timeout      unlimited            unlimited            us
            Jul 21 18:42:40 yak1 mariadbd[132692]: Core pattern: core
            Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
            Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Failed with result 'signal'.
            

            rgpublic Ranjan Ghosh added a comment - @Alexey. I just looked it up in the logs. First there's a long list of lines like these repeating for all schemas and tables: Jul 21 18:42:37 yak1 -innobackupex-backup [136486] : [01] 2020-07-21 18:42:37 Streaming ./schemaxyz/drupal_anyredirect_group.frm to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup [136486] : [01] 2020-07-21 18:42:37 ...done And then, finally: Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Finished backing up non-InnoDB tables and files Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log.00000001 to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561 Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT> Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561 Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT> Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Redo log (from LSN 18807858713703 to 18807917230561) was copied. Jul 21 18:42:39 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:39 completed OK! Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Note] WSREP: SST sent: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896576865 Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Warning] WSREP: server: yak1 unallowed state transition: joined -> joined Jul 21 18:42:39 yak1 mariadbd[132692]: mariadbd: /home/buildbot/buildbot/build/mariadb-10.5.4/wsrep-lib/src/server_state.cpp:1356: void wsrep::server_state::state(wsrep:> Jul 21 18:42:39 yak1 mariadbd[132692]: 200721 18:42:39 [ERROR] mysqld got signal 6 ; Jul 21 18:42:39 yak1 mariadbd[132692]: This could be because you hit a bug. It is also possible that this binary Jul 21 18:42:39 yak1 mariadbd[132692]: or one of the libraries it was linked against is corrupt, improperly built, Jul 21 18:42:39 yak1 mariadbd[132692]: or misconfigured. This error can also be caused by malfunctioning hardware. Jul 21 18:42:39 yak1 mariadbd[132692]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Jul 21 18:42:39 yak1 mariadbd[132692]: We will try our best to scrape up some info that will hopefully help Jul 21 18:42:39 yak1 mariadbd[132692]: diagnose the problem, but since we have already crashed, Jul 21 18:42:39 yak1 mariadbd[132692]: something is definitely wrong and this may fail. Jul 21 18:42:39 yak1 mariadbd[132692]: Server version: 10.5.4-MariaDB-1:10.5.4+maria~focal Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size=134217728 Jul 21 18:42:39 yak1 mariadbd[132692]: read_buffer_size=131072 Jul 21 18:42:39 yak1 mariadbd[132692]: max_used_connections=21 Jul 21 18:42:39 yak1 mariadbd[132692]: max_threads=65537 Jul 21 18:42:39 yak1 mariadbd[132692]: thread_count=6 Jul 21 18:42:39 yak1 mariadbd[132692]: It is possible that mysqld could use up to Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 142752116 K bytes of memory Jul 21 18:42:39 yak1 mariadbd[132692]: Hope that's ok; if not, decrease some variables in the equation. Jul 21 18:42:39 yak1 mariadbd[132692]: Thread pointer: 0x7f5a90000c58 Jul 21 18:42:39 yak1 mariadbd[132692]: Attempting backtrace. You can use the following information to find out Jul 21 18:42:39 yak1 mariadbd[132692]: where mysqld died. If you see no messages after this, something went Jul 21 18:42:39 yak1 mariadbd[132692]: terribly wrong... Jul 21 18:42:39 yak1 mariadbd[132692]: stack_bottom = 0x7f5ab5ffaaa8 thread_stack 0x49000 Jul 21 18:42:39 yak1 -wsrep-sst-donor[177111]: Total time on donor: 0 seconds Jul 21 18:42:39 yak1 -wsrep-sst-donor[177113]: Cleaning up temporary directories Jul 21 18:42:39 yak1 mariadbd[132692]: /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5559d678d9c2] Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(my_print_stacktrace)[0x5559d61db495] Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f641ef1d3c0] Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f641ea2418b] Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(gsignal)[0x7f641ea03859] Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(+0x25729)[0x7f641ea03729] Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(__assert_fail)[0x7f641ea14f36] Jul 21 18:42:40 yak1 mariadbd[132692]: /usr/sbin/mariadbd(_ZN5wsrep12server_state5stateERNS_11unique_lockINS_5mutexEEENS0_5stateE+0x657)[0x5559d6810397] Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::state(wsrep::unique_lock<wsrep::mutex>&, wsrep::server_state::state))[0x5559d6810b5f] Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::sst_sent(wsrep::gtid const&, int))[0x5559d64a837c] Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep_sst_donate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, wsrep::gtid const&, > Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f641ef11609] Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f641eb00103] Jul 21 18:42:40 yak1 mariadbd[132692]: Trying to get some variables. Jul 21 18:42:40 yak1 mariadbd[132692]: Some pointers may be invalid and cause the dump to abort. Jul 21 18:42:40 yak1 mariadbd[132692]: Query (0x0): Jul 21 18:42:40 yak1 mariadbd[132692]: Connection ID (thread ID): 0 Jul 21 18:42:40 yak1 mariadbd[132692]: Status: NOT_KILLED Jul 21 18:42:40 yak1 mariadbd[132692]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_inter> Jul 21 18:42:40 yak1 mariadbd[132692]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Jul 21 18:42:40 yak1 mariadbd[132692]: information that should help you find out what is causing the crash. Jul 21 18:42:40 yak1 mariadbd[132692]: Writing a core file... Jul 21 18:42:40 yak1 mariadbd[132692]: Working directory at /var/lib/mysql Jul 21 18:42:40 yak1 mariadbd[132692]: Resource Limits: Jul 21 18:42:40 yak1 mariadbd[132692]: Limit Soft Limit Hard Limit Units Jul 21 18:42:40 yak1 mariadbd[132692]: Max cpu time unlimited unlimited seconds Jul 21 18:42:40 yak1 mariadbd[132692]: Max file size unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max data size unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max stack size 8388608 unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max core file size 0 unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max resident set unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max processes 513841 513841 processes Jul 21 18:42:40 yak1 mariadbd[132692]: Max open files 16364 16364 files Jul 21 18:42:40 yak1 mariadbd[132692]: Max locked memory 65536 65536 bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max address space unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max file locks unlimited unlimited locks Jul 21 18:42:40 yak1 mariadbd[132692]: Max pending signals 513841 513841 signals Jul 21 18:42:40 yak1 mariadbd[132692]: Max msgqueue size 819200 819200 bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max nice priority 0 0 Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime priority 0 0 Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime timeout unlimited unlimited us Jul 21 18:42:40 yak1 mariadbd[132692]: Core pattern: core Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Failed with result 'signal'.
            Yurchenko Alexey added a comment -

            As for the main report: error log snippets start too late to have a certain diagnosis, however they are consistent with the hypothesis that the donor was heading for SST bypass (i.e. actual transfer was going to be through IST), however for some reason SST was taking too long (when it should have been momentary) and joiner, being unable to apply IST events was just idling on the connection and it timed out.

            Yurchenko Alexey added a comment - As for the main report: error log snippets start too late to have a certain diagnosis, however they are consistent with the hypothesis that the donor was heading for SST bypass (i.e. actual transfer was going to be through IST), however for some reason SST was taking too long (when it should have been momentary) and joiner, being unable to apply IST events was just idling on the connection and it timed out.
            rgpublic Ranjan Ghosh added a comment -

            Perhaps also related to MDEV-23053 ? I see some similarities...

            rgpublic Ranjan Ghosh added a comment - Perhaps also related to MDEV-23053 ? I see some similarities...
            jumpojoy Vasyl Saienko added a comment -

            We've meet with the same issue:

            2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.misc:2
            2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 5470044 last received: 5457306
            2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 10 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
            2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 	 at galera/src/replicator_smm.hpp:pop_front():315. Null event.
            

            On the donor side:

            2021-05-19 09:48:11,670 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 0 [ERROR] WSREP: async IST sender failed to serve tcp://192.168.44.112:4568: ist send failed: asio.system:14', asio error 'write: Bad address': 14 (Bad address)
            

            jumpojoy Vasyl Saienko added a comment - We've meet with the same issue: 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.misc: 2 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 5470044 last received: 5457306 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 10 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error) 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - at galera/src/replicator_smm.hpp:pop_front(): 315 . Null event. On the donor side: 2021 - 05 - 19 09 : 48 : 11 , 670 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 0 [ERROR] WSREP: async IST sender failed to serve tcp: //192.168.44.112:4568: ist send failed: asio.system:14', asio error 'write: Bad address': 14 (Bad address)

            The same happened to a customer running 10.5.6.
            On donor side, the following message appeared:

            2023-06-14  7:59:37 0 [ERROR] WSREP: async IST sender failed to serve ssl://10.29.95.53:4568: Unrecognized writeset version: -1: 71 (Protocol error)
            

            rpizzi Rick Pizzi (Inactive) added a comment - The same happened to a customer running 10.5.6. On donor side, the following message appeared: 2023-06-14 7:59:37 0 [ERROR] WSREP: async IST sender failed to serve ssl://10.29.95.53:4568: Unrecognized writeset version: -1: 71 (Protocol error)

            In Galera library version 26.4.15 there is asio to 1.14.1 maybe that can be tested with more recent version of MariaDB server. Does the issue still reproduce?

            janlindstrom Jan Lindström added a comment - In Galera library version 26.4.15 there is asio to 1.14.1 maybe that can be tested with more recent version of MariaDB server. Does the issue still reproduce?
            euglorg Eugene added a comment -

            Just for the case, since we upgraded to mariadb-10.6 (and thus, asio-1.26), we haven't seen this very issue. For at least half a year.

            euglorg Eugene added a comment - Just for the case, since we upgraded to mariadb-10.6 (and thus, asio-1.26), we haven't seen this very issue. For at least half a year.
            jumpojoy Vasyl Saienko added a comment -

            Hello Eugene,

            We tried to use more fresh mariadb version and still see the same issue, during abnormal shutdown with kill -9.

            mysql@mariadb-server-0:/$ dpkg -l |grep galera
            ii  galera-4                   26.4.14-ubu2004                   amd64        Replication framework for transactional applications
            mysql@mariadb-server-0:/$ dpkg -l |grep maria
            ii  libdbd-mariadb-perl        1.11-3ubuntu2                     amd64        Perl5 database interface to the MariaDB/MySQL databases
            ii  libmariadb3:amd64          1:10.6.14+maria~ubu2004           amd64        MariaDB database client library
            ii  mariadb-backup             1:10.6.14+maria~ubu2004           amd64        Backup tool for MariaDB server
            ii  mariadb-client-10.6        1:10.6.14+maria~ubu2004           amd64        MariaDB database client binaries
            ii  mariadb-client-core-10.6   1:10.6.14+maria~ubu2004           amd64        MariaDB database core client binaries
            ii  mariadb-common             1:10.6.14+maria~ubu2004           all          MariaDB common configuration files
            ii  mariadb-server             1:10.6.14+maria~ubu2004           all          MariaDB database server (metapackage depending on the latest version)
            ii  mariadb-server-10.6        1:10.6.14+maria~ubu2004           amd64        MariaDB database server binaries
            ii  mariadb-server-core-10.6   1:10.6.14+maria~ubu2004           amd64        MariaDB database core server files
            ii  mysql-common               1:10.6.14+maria~ubu2004           all          MariaDB database common files (e.g. /etc/mysql/my.cnf)
            

            jumpojoy Vasyl Saienko added a comment - Hello Eugene, We tried to use more fresh mariadb version and still see the same issue, during abnormal shutdown with kill -9. mysql @mariadb -server- 0 :/$ dpkg -l |grep galera ii galera- 4 26.4 . 14 -ubu2004 amd64 Replication framework for transactional applications mysql @mariadb -server- 0 :/$ dpkg -l |grep maria ii libdbd-mariadb-perl 1.11 -3ubuntu2 amd64 Perl5 database interface to the MariaDB/MySQL databases ii libmariadb3:amd64 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database client library ii mariadb-backup 1 : 10.6 . 14 +maria~ubu2004 amd64 Backup tool for MariaDB server ii mariadb-client- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database client binaries ii mariadb-client-core- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database core client binaries ii mariadb-common 1 : 10.6 . 14 +maria~ubu2004 all MariaDB common configuration files ii mariadb-server 1 : 10.6 . 14 +maria~ubu2004 all MariaDB database server (metapackage depending on the latest version) ii mariadb-server- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database server binaries ii mariadb-server-core- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database core server files ii mysql-common 1 : 10.6 . 14 +maria~ubu2004 all MariaDB database common files (e.g. /etc/mysql/my.cnf)
            jumpojoy Vasyl Saienko added a comment -

            oh, there is 26.4.15, let me try and get back with feedback

            jumpojoy Vasyl Saienko added a comment - oh, there is 26.4.15, let me try and get back with feedback
            jumpojoy Vasyl Saienko added a comment -

            I confirm with galera 26.4.15 issue is fixed, do you know when mariadb with this galera version will be released officially?

            jumpojoy Vasyl Saienko added a comment - I confirm with galera 26.4.15 issue is fixed, do you know when mariadb with this galera version will be released officially?

            People

              teemu.ollakka Teemu Ollakka
              euglorg Eugene
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.