[MDEV-22796] asio problems and IST failures - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.4.13
Fix Version/s: N/A
Component/s: Galera
Labels:
None
Environment:
Linux 4.19.124-gentoo x86_64 AMD EPYC 7451, Intel I350 Gigabit Ethernet

Description

Note. Recent Galera version affected is 1.0.4/26.4.4, but same is relevant for previous versions, at least 26.4.3.

We have cluster of 3 nodes working with large (above 1 TB) data volume. All the nodes have same hardware and software. Sometimes nodes run SST and IST to transfer data.
It was notices that frequently SST fails and has to be restarted due to error:

[ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)

We tried to set bigger galera gcache size, but on some cases error happened again, on some it didn't. Moreover, sometimes simple restart of mysqld on receiving node (and thus restarting SST when donor node returned to synced state back) lead to successful SST and joiner managed to join the cluster, but sometimes it failed.

It was noticed that gcache size and amount of transactions happening on cluster nodes has no effect on the issue.
disabling or enabling compression of state transfer data and also attempts to flush logs has had no effect also
It was also noticed that in case IST failed, it was always possible to find same error message logged at 20(+/- 1) minutes after starting mysqld on joining node (thus, 20 minutes after state transfer request). This error was:

2020-06-01 21:50:42 0 [Note] WSREP: IST sender 232217729 -> 232234231

...

WSREP_SST: [INFO] Evaluating /usr/bin/mariabackup --innobackupex --defaults-file=/etc/mysql/my.cnf     $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2> /var/lib/mysql//mariabackup.backup.log | /usr/bin/zstd --fast=3 | socat -u stdio TCP:***.*.***.*:4444; RC=( ${PIPESTATUS[@]} ) (20200601 21:50:53.977)

2020-06-01 22:10:59 0 [ERROR] WSREP: async IST sender failed to serve tcp://***.*.***.*:4568: ist send failed: asio.system:110', asio error 'write: Connection timed out': 110 (Connection timed out)

     at galera/src/ist.cpp:send():887

2020-06-01 22:10:59 0 [Note] WSREP: async IST sender served

Appearance of these last two lines (error+note) in mysqld log file always ended with state transfer failed with following errors logged:

2020-06-02  1:49:58 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:232217728, protocol version: 5

2020-06-02  1:49:58 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104

2020-06-02  1:49:58 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 232234231 last received: 232221423

2020-06-02  1:49:58 2 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)

     at galera/src/replicator_smm.hpp:pop_front():314. Null event.

So, questionable things are:

How to avoid such situations - nodes require manual restart on failed transfers!
Why this asio error is always logged 20 minutes after state transfer start?
Reported failure is 'Connection timed out' while connection is stable and no service or monitoring tool reports connection issues
Issue is floating: on some restarts it appears and on others it doesn't, this was actual for previous version on galera library, too. No configuration change seems to cause or solve this.
Also it was noted that referred asio library that is used by galera is 1.10.8 and this version can't be changed - however version 1.18 is out already.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mcc-mariadb-logs.tar.gz
35 kB
2021-05-20 09:12

Issue Links

relates to

MDEV-22797 galera uses old version of asio library

Open

Activity

Ascending order - Click to sort in descending order

Ranjan Ghosh added a comment - 2020-07-21 16:59

Also happens for me very often when I just try to start a node. I start the first node. Works. DB is running. Then I try to start the second node (with a completely empty /var/lib/mysql folder). mariabackup sst starts. But on the second node I find this in the logs:

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 3 [Note] WSREP: SST received: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Joiner monitor thread ended with total time 482 sec

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Installed new state from SST: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preload up to 2896595604

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST applying starts with 2896595605

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST current seqno initialized to 2896576828

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Receiving IST...  0.0% ( 0/37 events) complete.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: IST preload starting at 2896576828

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Service thread queue flushed.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:2896576827, protocol version: 5

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)

Jul 21 18:43:00 yak2 mariadbd[778444]:          at galera/src/replicator_smm.hpp:pop_front():315. Null event.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing send monitor...

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closed send monitor.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: terminating thread

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: joining thread

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closing backend

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: PC protocol downgrade 1 -> 0

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: view((empty))

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closed

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: New SELF-LEAVE.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Flow-control interval: [0, 0]

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2896595604)

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: RECV thread exiting 0: Success

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: recv_thread() joined.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing replication queue.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing slave action queue.

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preloaded up to 2896576840

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: gcs/src/gcs.cpp:s_join():929: Sending JOIN failed: -103 (Software caused connection abort).

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Warning] WSREP: Failed to JOIN the cluster after SST gcs_join(a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604) failed: 103 (Software caused connection abort)

Jul 21 18:43:00 yak2 mariadbd[778444]:          at galera/src/galera_gcs.hpp:join():218

After that, the cluster is in state "INCONSISTENT" and doesn't work. Neither on the first nor on the second node. I usually resolve this by killing the mariadbd on both nodes, then switching back to sst_method: rsync.

Ranjan Ghosh added a comment - 2020-07-21 16:59 Also happens for me very often when I just try to start a node. I start the first node. Works. DB is running. Then I try to start the second node (with a completely empty /var/lib/mysql folder). mariabackup sst starts. But on the second node I find this in the logs: Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 3 [Note] WSREP: SST received: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Joiner monitor thread ended with total time 482 sec Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Installed new state from SST: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preload up to 2896595604 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST applying starts with 2896595605 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### IST current seqno initialized to 2896576828 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Receiving IST... 0.0% ( 0/37 events) complete. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: IST preload starting at 2896576828 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Service thread queue flushed. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:2896576827, protocol version: 5 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error) Jul 21 18:43:00 yak2 mariadbd[778444]: at galera/src/replicator_smm.hpp:pop_front():315. Null event. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing send monitor... Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closed send monitor. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: terminating thread Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: joining thread Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closing backend Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: PC protocol downgrade 1 -> 0 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: view((empty)) Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: gcomm: closed Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: New SELF-LEAVE. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Flow-control interval: [0, 0] Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Received SELF-LEAVE. Closing connection. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2896595604) Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [Note] WSREP: RECV thread exiting 0: Success Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: recv_thread() joined. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing replication queue. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Closing slave action queue. Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Note] WSREP: Cert. index preloaded up to 2896576840 Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [ERROR] WSREP: gcs/src/gcs.cpp:s_join():929: Sending JOIN failed: -103 (Software caused connection abort). Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 2 [Warning] WSREP: Failed to JOIN the cluster after SST gcs_join(a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896595604) failed: 103 (Software caused connection abort) Jul 21 18:43:00 yak2 mariadbd[778444]: at galera/src/galera_gcs.hpp:join():218 After that, the cluster is in state "INCONSISTENT" and doesn't work. Neither on the first nor on the second node. I usually resolve this by killing the mariadbd on both nodes, then switching back to sst_method: rsync.

Ranjan Ghosh added a comment - 2020-07-21 17:04

PS: On MDEV-22797 I read some talk about timeouts. Perhaps this is useful: We have 2 technically completely identical clusters (exact same hardware, OS, setup etc). On one cluster this problem never happens. On the other cluster, this happens very frequently. The only difference is: The database of the problematic cluster (where the log excerpt I just posted is from) is much larger (lots more schemas and tables). Therefore, the SST is running way longer (a few minutes). So at least I find it not unlikely that this has sth. todo with some timeout or timing problem in general. That might be why it doesn't surface for everyone.

Ranjan Ghosh added a comment - 2020-07-21 17:04 PS: On MDEV-22797 I read some talk about timeouts. Perhaps this is useful: We have 2 technically completely identical clusters (exact same hardware, OS, setup etc). On one cluster this problem never happens. On the other cluster, this happens very frequently. The only difference is: The database of the problematic cluster (where the log excerpt I just posted is from) is much larger (lots more schemas and tables). Therefore, the SST is running way longer (a few minutes). So at least I find it not unlikely that this has sth. todo with some timeout or timing problem in general. That might be why it doesn't surface for everyone.

Alexey added a comment - 2020-07-22 12:38

Jul 21 18:43:00 yak2 mariadbd[778444]: 2020-07-21 18:43:00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840

What does donor log say about that?

Alexey added a comment - 2020-07-22 12:38 Jul 21 18 : 43 : 00 yak2 mariadbd[ 778444 ]: 2020 - 07 - 21 18 : 43 : 00 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2896576864 last received: 2896576840 What does donor log say about that?

Ranjan Ghosh added a comment - 2020-07-22 12:51

@Alexey. I just looked it up in the logs. First there's a long list of lines like these repeating for all schemas and tables:

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./schemaxyz/drupal_anyredirect_group.frm to <STDOUT>
Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done

And then, finally:

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Finished backing up non-InnoDB tables and files

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log.00000001 to <STDOUT>

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37         ...done

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT>

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37         ...done

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549'

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]:

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT>

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/'

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT>

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37         ...done

Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS...

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549'

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]:

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561)

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT>

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/'

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38         ...done

Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Redo log (from LSN 18807858713703 to 18807917230561) was copied.

Jul 21 18:42:39 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:39 completed OK!

Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Note] WSREP: SST sent: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896576865

Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Warning] WSREP: server: yak1 unallowed state transition: joined -> joined

Jul 21 18:42:39 yak1 mariadbd[132692]: mariadbd: /home/buildbot/buildbot/build/mariadb-10.5.4/wsrep-lib/src/server_state.cpp:1356: void wsrep::server_state::state(wsrep:>

Jul 21 18:42:39 yak1 mariadbd[132692]: 200721 18:42:39 [ERROR] mysqld got signal 6 ;

Jul 21 18:42:39 yak1 mariadbd[132692]: This could be because you hit a bug. It is also possible that this binary

Jul 21 18:42:39 yak1 mariadbd[132692]: or one of the libraries it was linked against is corrupt, improperly built,

Jul 21 18:42:39 yak1 mariadbd[132692]: or misconfigured. This error can also be caused by malfunctioning hardware.

Jul 21 18:42:39 yak1 mariadbd[132692]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Jul 21 18:42:39 yak1 mariadbd[132692]: We will try our best to scrape up some info that will hopefully help

Jul 21 18:42:39 yak1 mariadbd[132692]: diagnose the problem, but since we have already crashed,

Jul 21 18:42:39 yak1 mariadbd[132692]: something is definitely wrong and this may fail.

Jul 21 18:42:39 yak1 mariadbd[132692]: Server version: 10.5.4-MariaDB-1:10.5.4+maria~focal

Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size=134217728

Jul 21 18:42:39 yak1 mariadbd[132692]: read_buffer_size=131072

Jul 21 18:42:39 yak1 mariadbd[132692]: max_used_connections=21

Jul 21 18:42:39 yak1 mariadbd[132692]: max_threads=65537

Jul 21 18:42:39 yak1 mariadbd[132692]: thread_count=6

Jul 21 18:42:39 yak1 mariadbd[132692]: It is possible that mysqld could use up to

Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 142752116 K  bytes of memory

Jul 21 18:42:39 yak1 mariadbd[132692]: Hope that's ok; if not, decrease some variables in the equation.

Jul 21 18:42:39 yak1 mariadbd[132692]: Thread pointer: 0x7f5a90000c58

Jul 21 18:42:39 yak1 mariadbd[132692]: Attempting backtrace. You can use the following information to find out

Jul 21 18:42:39 yak1 mariadbd[132692]: where mysqld died. If you see no messages after this, something went

Jul 21 18:42:39 yak1 mariadbd[132692]: terribly wrong...

Jul 21 18:42:39 yak1 mariadbd[132692]: stack_bottom = 0x7f5ab5ffaaa8 thread_stack 0x49000

Jul 21 18:42:39 yak1 -wsrep-sst-donor[177111]: Total time on donor: 0 seconds

Jul 21 18:42:39 yak1 -wsrep-sst-donor[177113]: Cleaning up temporary directories

Jul 21 18:42:39 yak1 mariadbd[132692]: /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5559d678d9c2]

Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(my_print_stacktrace)[0x5559d61db495]

Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f641ef1d3c0]

Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f641ea2418b]

Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(gsignal)[0x7f641ea03859]

Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(+0x25729)[0x7f641ea03729]

Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(__assert_fail)[0x7f641ea14f36]

Jul 21 18:42:40 yak1 mariadbd[132692]: /usr/sbin/mariadbd(_ZN5wsrep12server_state5stateERNS_11unique_lockINS_5mutexEEENS0_5stateE+0x657)[0x5559d6810397]

Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::state(wsrep::unique_lock<wsrep::mutex>&, wsrep::server_state::state))[0x5559d6810b5f]

Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::sst_sent(wsrep::gtid const&, int))[0x5559d64a837c]

Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep_sst_donate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, wsrep::gtid const&, >

Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f641ef11609]

Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f641eb00103]

Jul 21 18:42:40 yak1 mariadbd[132692]: Trying to get some variables.

Jul 21 18:42:40 yak1 mariadbd[132692]: Some pointers may be invalid and cause the dump to abort.

Jul 21 18:42:40 yak1 mariadbd[132692]: Query (0x0):

Jul 21 18:42:40 yak1 mariadbd[132692]: Connection ID (thread ID): 0

Jul 21 18:42:40 yak1 mariadbd[132692]: Status: NOT_KILLED

Jul 21 18:42:40 yak1 mariadbd[132692]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_inter>

Jul 21 18:42:40 yak1 mariadbd[132692]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

Jul 21 18:42:40 yak1 mariadbd[132692]: information that should help you find out what is causing the crash.

Jul 21 18:42:40 yak1 mariadbd[132692]: Writing a core file...

Jul 21 18:42:40 yak1 mariadbd[132692]: Working directory at /var/lib/mysql

Jul 21 18:42:40 yak1 mariadbd[132692]: Resource Limits:

Jul 21 18:42:40 yak1 mariadbd[132692]: Limit                     Soft Limit           Hard Limit           Units

Jul 21 18:42:40 yak1 mariadbd[132692]: Max cpu time              unlimited            unlimited            seconds

Jul 21 18:42:40 yak1 mariadbd[132692]: Max file size             unlimited            unlimited            bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max data size             unlimited            unlimited            bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max stack size            8388608              unlimited            bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max core file size        0                    unlimited            bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max resident set          unlimited            unlimited            bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max processes             513841               513841               processes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max open files            16364                16364                files

Jul 21 18:42:40 yak1 mariadbd[132692]: Max locked memory         65536                65536                bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max address space         unlimited            unlimited            bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max file locks            unlimited            unlimited            locks

Jul 21 18:42:40 yak1 mariadbd[132692]: Max pending signals       513841               513841               signals

Jul 21 18:42:40 yak1 mariadbd[132692]: Max msgqueue size         819200               819200               bytes

Jul 21 18:42:40 yak1 mariadbd[132692]: Max nice priority         0                    0

Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime priority     0                    0

Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime timeout      unlimited            unlimited            us

Jul 21 18:42:40 yak1 mariadbd[132692]: Core pattern: core

Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT

Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Failed with result 'signal'.

Ranjan Ghosh added a comment - 2020-07-22 12:51 @Alexey. I just looked it up in the logs. First there's a long list of lines like these repeating for all schemas and tables: Jul 21 18:42:37 yak1 -innobackupex-backup [136486] : [01] 2020-07-21 18:42:37 Streaming ./schemaxyz/drupal_anyredirect_group.frm to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup [136486] : [01] 2020-07-21 18:42:37 ...done And then, finally: Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Finished backing up non-InnoDB tables and files Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log.00000001 to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561 Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT> Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 Streaming ./aria_log_control to <STDOUT> Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [01] 2020-07-21 18:42:37 ...done Jul 21 18:42:37 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:37 Waiting for log copy thread to read lsn 18807917230561 Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_galera_info Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS... Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 mariabackup: The latest check point (for incremental): '18807917230549' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: mariabackup: Stopping log copying thread.[00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 >> log scanned up to (18807917230561) Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Executing BACKUP STAGE END Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 All tables unlocked Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming ib_buffer_pool to <STDOUT> Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Backup created in directory '/tmp/tmp.tGaft5xC75/' Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming backup-my.cnf Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Streaming xtrabackup_info Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 ...done Jul 21 18:42:38 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:38 Redo log (from LSN 18807858713703 to 18807917230561) was copied. Jul 21 18:42:39 yak1 -innobackupex-backup[136486]: [00] 2020-07-21 18:42:39 completed OK! Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Note] WSREP: SST sent: a6a91495-1b7f-11ea-a0be-7f78e01c49de:2896576865 Jul 21 18:42:39 yak1 mariadbd[132692]: 2020-07-21 18:42:39 0 [Warning] WSREP: server: yak1 unallowed state transition: joined -> joined Jul 21 18:42:39 yak1 mariadbd[132692]: mariadbd: /home/buildbot/buildbot/build/mariadb-10.5.4/wsrep-lib/src/server_state.cpp:1356: void wsrep::server_state::state(wsrep:> Jul 21 18:42:39 yak1 mariadbd[132692]: 200721 18:42:39 [ERROR] mysqld got signal 6 ; Jul 21 18:42:39 yak1 mariadbd[132692]: This could be because you hit a bug. It is also possible that this binary Jul 21 18:42:39 yak1 mariadbd[132692]: or one of the libraries it was linked against is corrupt, improperly built, Jul 21 18:42:39 yak1 mariadbd[132692]: or misconfigured. This error can also be caused by malfunctioning hardware. Jul 21 18:42:39 yak1 mariadbd[132692]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Jul 21 18:42:39 yak1 mariadbd[132692]: We will try our best to scrape up some info that will hopefully help Jul 21 18:42:39 yak1 mariadbd[132692]: diagnose the problem, but since we have already crashed, Jul 21 18:42:39 yak1 mariadbd[132692]: something is definitely wrong and this may fail. Jul 21 18:42:39 yak1 mariadbd[132692]: Server version: 10.5.4-MariaDB-1:10.5.4+maria~focal Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size=134217728 Jul 21 18:42:39 yak1 mariadbd[132692]: read_buffer_size=131072 Jul 21 18:42:39 yak1 mariadbd[132692]: max_used_connections=21 Jul 21 18:42:39 yak1 mariadbd[132692]: max_threads=65537 Jul 21 18:42:39 yak1 mariadbd[132692]: thread_count=6 Jul 21 18:42:39 yak1 mariadbd[132692]: It is possible that mysqld could use up to Jul 21 18:42:39 yak1 mariadbd[132692]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 142752116 K bytes of memory Jul 21 18:42:39 yak1 mariadbd[132692]: Hope that's ok; if not, decrease some variables in the equation. Jul 21 18:42:39 yak1 mariadbd[132692]: Thread pointer: 0x7f5a90000c58 Jul 21 18:42:39 yak1 mariadbd[132692]: Attempting backtrace. You can use the following information to find out Jul 21 18:42:39 yak1 mariadbd[132692]: where mysqld died. If you see no messages after this, something went Jul 21 18:42:39 yak1 mariadbd[132692]: terribly wrong... Jul 21 18:42:39 yak1 mariadbd[132692]: stack_bottom = 0x7f5ab5ffaaa8 thread_stack 0x49000 Jul 21 18:42:39 yak1 -wsrep-sst-donor[177111]: Total time on donor: 0 seconds Jul 21 18:42:39 yak1 -wsrep-sst-donor[177113]: Cleaning up temporary directories Jul 21 18:42:39 yak1 mariadbd[132692]: /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5559d678d9c2] Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(my_print_stacktrace)[0x5559d61db495] Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7f641ef1d3c0] Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f641ea2418b] Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(gsignal)[0x7f641ea03859] Jul 21 18:42:39 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(+0x25729)[0x7f641ea03729] Jul 21 18:42:39 yak1 mariadbd[132692]: ??:0(__assert_fail)[0x7f641ea14f36] Jul 21 18:42:40 yak1 mariadbd[132692]: /usr/sbin/mariadbd(_ZN5wsrep12server_state5stateERNS_11unique_lockINS_5mutexEEENS0_5stateE+0x657)[0x5559d6810397] Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::state(wsrep::unique_lock<wsrep::mutex>&, wsrep::server_state::state))[0x5559d6810b5f] Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep::server_state::sst_sent(wsrep::gtid const&, int))[0x5559d64a837c] Jul 21 18:42:40 yak1 mariadbd[132692]: ??:0(wsrep_sst_donate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, wsrep::gtid const&, > Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7f641ef11609] Jul 21 18:42:40 yak1 mariadbd[132692]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f641eb00103] Jul 21 18:42:40 yak1 mariadbd[132692]: Trying to get some variables. Jul 21 18:42:40 yak1 mariadbd[132692]: Some pointers may be invalid and cause the dump to abort. Jul 21 18:42:40 yak1 mariadbd[132692]: Query (0x0): Jul 21 18:42:40 yak1 mariadbd[132692]: Connection ID (thread ID): 0 Jul 21 18:42:40 yak1 mariadbd[132692]: Status: NOT_KILLED Jul 21 18:42:40 yak1 mariadbd[132692]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_inter> Jul 21 18:42:40 yak1 mariadbd[132692]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Jul 21 18:42:40 yak1 mariadbd[132692]: information that should help you find out what is causing the crash. Jul 21 18:42:40 yak1 mariadbd[132692]: Writing a core file... Jul 21 18:42:40 yak1 mariadbd[132692]: Working directory at /var/lib/mysql Jul 21 18:42:40 yak1 mariadbd[132692]: Resource Limits: Jul 21 18:42:40 yak1 mariadbd[132692]: Limit Soft Limit Hard Limit Units Jul 21 18:42:40 yak1 mariadbd[132692]: Max cpu time unlimited unlimited seconds Jul 21 18:42:40 yak1 mariadbd[132692]: Max file size unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max data size unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max stack size 8388608 unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max core file size 0 unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max resident set unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max processes 513841 513841 processes Jul 21 18:42:40 yak1 mariadbd[132692]: Max open files 16364 16364 files Jul 21 18:42:40 yak1 mariadbd[132692]: Max locked memory 65536 65536 bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max address space unlimited unlimited bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max file locks unlimited unlimited locks Jul 21 18:42:40 yak1 mariadbd[132692]: Max pending signals 513841 513841 signals Jul 21 18:42:40 yak1 mariadbd[132692]: Max msgqueue size 819200 819200 bytes Jul 21 18:42:40 yak1 mariadbd[132692]: Max nice priority 0 0 Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime priority 0 0 Jul 21 18:42:40 yak1 mariadbd[132692]: Max realtime timeout unlimited unlimited us Jul 21 18:42:40 yak1 mariadbd[132692]: Core pattern: core Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT Jul 21 18:42:41 yak1 systemd[1]: mariadb.service: Failed with result 'signal'.

Alexey added a comment - 2020-07-22 13:04

As for the main report: error log snippets start too late to have a certain diagnosis, however they are consistent with the hypothesis that the donor was heading for SST bypass (i.e. actual transfer was going to be through IST), however for some reason SST was taking too long (when it should have been momentary) and joiner, being unable to apply IST events was just idling on the connection and it timed out.

Alexey added a comment - 2020-07-22 13:04 As for the main report: error log snippets start too late to have a certain diagnosis, however they are consistent with the hypothesis that the donor was heading for SST bypass (i.e. actual transfer was going to be through IST), however for some reason SST was taking too long (when it should have been momentary) and joiner, being unable to apply IST events was just idling on the connection and it timed out.

Ranjan Ghosh added a comment - 2020-07-22 15:31

Perhaps also related to ~~MDEV-23053~~ ? I see some similarities...

Ranjan Ghosh added a comment - 2020-07-22 15:31 Perhaps also related to MDEV-23053 ? I see some similarities...

Vasyl Saienko added a comment - 2021-05-20 09:11

We've meet with the same issue:

2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.misc:2

2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 5470044 last received: 5457306

2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 10 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)

2021-05-19 09:48:11,721 - OpenStack-Helm Mariadb - INFO - 	 at galera/src/replicator_smm.hpp:pop_front():315. Null event.

On the donor side:

2021-05-19 09:48:11,670 - OpenStack-Helm Mariadb - INFO - 2021-05-19  9:48:11 0 [ERROR] WSREP: async IST sender failed to serve tcp://192.168.44.112:4568: ist send failed: asio.system:14', asio error 'write: Bad address': 14 (Bad address)

Vasyl Saienko added a comment - 2021-05-20 09:11 We've meet with the same issue: 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.misc: 2 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 5470044 last received: 5457306 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 10 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error) 2021 - 05 - 19 09 : 48 : 11 , 721 - OpenStack-Helm Mariadb - INFO - at galera/src/replicator_smm.hpp:pop_front(): 315 . Null event. On the donor side: 2021 - 05 - 19 09 : 48 : 11 , 670 - OpenStack-Helm Mariadb - INFO - 2021 - 05 - 19 9 : 48 : 11 0 [ERROR] WSREP: async IST sender failed to serve tcp: //192.168.44.112:4568: ist send failed: asio.system:14', asio error 'write: Bad address': 14 (Bad address)

Rick Pizzi (Inactive) added a comment - 2023-06-14 09:03

The same happened to a customer running 10.5.6.
On donor side, the following message appeared:

2023-06-14  7:59:37 0 [ERROR] WSREP: async IST sender failed to serve ssl://10.29.95.53:4568: Unrecognized writeset version: -1: 71 (Protocol error)

Rick Pizzi (Inactive) added a comment - 2023-06-14 09:03 The same happened to a customer running 10.5.6. On donor side, the following message appeared: 2023-06-14 7:59:37 0 [ERROR] WSREP: async IST sender failed to serve ssl://10.29.95.53:4568: Unrecognized writeset version: -1: 71 (Protocol error)

Jan Lindström added a comment - 2023-08-07 05:58

In Galera library version 26.4.15 there is asio to 1.14.1 maybe that can be tested with more recent version of MariaDB server. Does the issue still reproduce?

Jan Lindström added a comment - 2023-08-07 05:58 In Galera library version 26.4.15 there is asio to 1.14.1 maybe that can be tested with more recent version of MariaDB server. Does the issue still reproduce?

Eugene added a comment - 2023-08-07 10:04

Just for the case, since we upgraded to mariadb-10.6 (and thus, asio-1.26), we haven't seen this very issue. For at least half a year.

Eugene added a comment - 2023-08-07 10:04 Just for the case, since we upgraded to mariadb-10.6 (and thus, asio-1.26), we haven't seen this very issue. For at least half a year.

Vasyl Saienko added a comment - 2023-08-14 15:05

Hello Eugene,

We tried to use more fresh mariadb version and still see the same issue, during abnormal shutdown with kill -9.

mysql@mariadb-server-0:/$ dpkg -l |grep galera

ii  galera-4                   26.4.14-ubu2004                   amd64        Replication framework for transactional applications

mysql@mariadb-server-0:/$ dpkg -l |grep maria

ii  libdbd-mariadb-perl        1.11-3ubuntu2                     amd64        Perl5 database interface to the MariaDB/MySQL databases

ii  libmariadb3:amd64          1:10.6.14+maria~ubu2004           amd64        MariaDB database client library

ii  mariadb-backup             1:10.6.14+maria~ubu2004           amd64        Backup tool for MariaDB server

ii  mariadb-client-10.6        1:10.6.14+maria~ubu2004           amd64        MariaDB database client binaries

ii  mariadb-client-core-10.6   1:10.6.14+maria~ubu2004           amd64        MariaDB database core client binaries

ii  mariadb-common             1:10.6.14+maria~ubu2004           all          MariaDB common configuration files

ii  mariadb-server             1:10.6.14+maria~ubu2004           all          MariaDB database server (metapackage depending on the latest version)

ii  mariadb-server-10.6        1:10.6.14+maria~ubu2004           amd64        MariaDB database server binaries

ii  mariadb-server-core-10.6   1:10.6.14+maria~ubu2004           amd64        MariaDB database core server files

ii  mysql-common               1:10.6.14+maria~ubu2004           all          MariaDB database common files (e.g. /etc/mysql/my.cnf)

Vasyl Saienko added a comment - 2023-08-14 15:05 Hello Eugene, We tried to use more fresh mariadb version and still see the same issue, during abnormal shutdown with kill -9. mysql @mariadb -server- 0 :/$ dpkg -l |grep galera ii galera- 4 26.4 . 14 -ubu2004 amd64 Replication framework for transactional applications mysql @mariadb -server- 0 :/$ dpkg -l |grep maria ii libdbd-mariadb-perl 1.11 -3ubuntu2 amd64 Perl5 database interface to the MariaDB/MySQL databases ii libmariadb3:amd64 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database client library ii mariadb-backup 1 : 10.6 . 14 +maria~ubu2004 amd64 Backup tool for MariaDB server ii mariadb-client- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database client binaries ii mariadb-client-core- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database core client binaries ii mariadb-common 1 : 10.6 . 14 +maria~ubu2004 all MariaDB common configuration files ii mariadb-server 1 : 10.6 . 14 +maria~ubu2004 all MariaDB database server (metapackage depending on the latest version) ii mariadb-server- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database server binaries ii mariadb-server-core- 10.6 1 : 10.6 . 14 +maria~ubu2004 amd64 MariaDB database core server files ii mysql-common 1 : 10.6 . 14 +maria~ubu2004 all MariaDB database common files (e.g. /etc/mysql/my.cnf)

Vasyl Saienko added a comment - 2023-08-14 15:22

oh, there is 26.4.15, let me try and get back with feedback

Vasyl Saienko added a comment - 2023-08-14 15:22 oh, there is 26.4.15, let me try and get back with feedback

Vasyl Saienko added a comment - 2023-08-15 19:07

I confirm with galera 26.4.15 issue is fixed, do you know when mariadb with this galera version will be released officially?

Vasyl Saienko added a comment - 2023-08-15 19:07 I confirm with galera 26.4.15 issue is fixed, do you know when mariadb with this galera version will be released officially?

People

Assignee:: Teemu Ollakka

Reporter:: Eugene

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2020-06-04 11:57

Updated:: 2024-07-07 22:00

Resolved:: 2023-09-11 06:10

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration