[MDEV-25739] Failed to create a new provider '/usr/lib/galera/libgalera_smm.so' with options Created: 2021-05-20  Updated: 2021-08-17  Resolved: 2021-07-19

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.5.9, 10.5.10, 10.5
Fix Version/s: 10.4.21, 10.5.12, 10.6.4

Type: Bug Priority: Blocker
Reporter: Jaroslav Assignee: Alexey
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Relates
relates to MDEV-24615 MariaDB 10.5.8 Galera node fails to s... Closed
relates to MDEV-25605 Failed to initialize wsrep provider -... Closed

 Description   

After upgrade to 10.5.10 and also on 10.5.9 we started to see following issue where node will never sync back to cluster and end in failed state

jaro@cv-sqa-us-east4-k8s-lmgmt-a:~$ kubectl logs mysql-2 -n sde mysql
2021/05/20 07:04:07 Peer list updated
was []
now [mysql-0.mysql.sde.svc.cluster.local mysql-1.mysql.sde.svc.cluster.local mysql-2.mysql.sde.svc.cluster.local]
2021/05/20 07:04:07 execing: /opt/galera/on-start.sh with stdin: mysql-0.mysql.sde.svc.cluster.local
mysql-1.mysql.sde.svc.cluster.local
mysql-2.mysql.sde.svc.cluster.local
2021/05/20 07:04:07 *** [Galera] Joining cluster: mysql-0.mysql.sde.svc.cluster.local,mysql-1.mysql.sde.svc.cluster.local
2021/05/20 07:04:08 Peer finder exiting
Galera - Determining recovery position...
galera-recovery.sh: Attempting to recover GTID positon...
2021-05-20  7:04:08 0 [Note] mysqld (mysqld 10.5.10-MariaDB-1:10.5.10+maria~focal) starting as process 48 ...
galera-recovery.sh: Found WSREP position: 6c9afcd7-96b7-11ea-96a5-76e81fcbb085:19951024
Galera recovery position: --wsrep_start_position=6c9afcd7-96b7-11ea-96a5-76e81fcbb085:19951024
2021-05-20  7:04:09 0 [Note] mysqld (mysqld 10.5.10-MariaDB-1:10.5.10+maria~focal) starting as process 1 ...
2021-05-20  7:04:09 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 6c9afcd7-96b7-11ea-96a5-76e81fcbb085:19951024
2021-05-20  7:04:09 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2021-05-20  7:04:09 0 [Note] WSREP: wsrep_load(): Galera 26.4.8(r902dd268) by Codership Oy <info@codership.com> loaded successfully.
2021-05-20  7:04:09 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-05-20  7:04:09 0 [Note] WSREP: Found saved state: 6c9afcd7-96b7-11ea-96a5-76e81fcbb085:-1, safe_to_bootstrap: 1
2021-05-20  7:04:09 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 6c9afcd7-96b7-11ea-96a5-76e81fcbb085
Seqno: -1 - -1
Offset: -1
Synced: 0
2021-05-20  7:04:09 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 6c9afcd7-96b7-11ea-96a5-76e81fcbb085, offset: -1
2021-05-20  7:04:09 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2021-05-20  7:04:09 0 [ERROR] WSREP: deque::_M_new_elements_at_back
2021-05-20  7:04:09 0 [ERROR] WSREP: Failed to create a new provider '/usr/lib/galera/libgalera_smm.so' with options '': Failed to initialize wsrep provider
2021-05-20  7:04:09 0 [ERROR] WSREP: Failed to load provider
2021-05-20  7:04:09 0 [ERROR] Aborting

On the other nodes we also could see

2021/05/20 07:37:08 Peer list updated
was []
now [mysql-0.mysql.default.svc.cluster.local mysql-1.mysql.default.svc.cluster.local mysql-2.mysql.default.svc.cluster.local]
2021/05/20 07:37:08 execing: /opt/galera/on-start.sh with stdin: mysql-0.mysql.default.svc.cluster.local
mysql-1.mysql.default.svc.cluster.local
mysql-2.mysql.default.svc.cluster.local
2021/05/20 07:37:08 *** [Galera] Joining cluster: mysql-1.mysql.default.svc.cluster.local,mysql-2.mysql.default.svc.cluster.local
2021/05/20 07:37:09 Peer finder exiting
Galera - Determining recovery position...
galera-recovery.sh: Attempting to recover GTID positon...
2021-05-20  7:37:09 0 [Note] mysqld (mysqld 10.5.9-MariaDB-1:10.5.9+maria~focal) starting as process 49 ...
galera-recovery.sh: Found WSREP position: 8338b624-66eb-11eb-93e0-323a1dc8d4de:8714889
Galera recovery position: --wsrep_start_position=8338b624-66eb-11eb-93e0-323a1dc8d4de:8714889
2021-05-20  7:37:10 0 [Note] mysqld (mysqld 10.5.9-MariaDB-1:10.5.9+maria~focal) starting as process 1 ...
2021-05-20  7:37:10 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 8338b624-66eb-11eb-93e0-323a1dc8d4de:8714889
2021-05-20  7:37:10 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2021-05-20  7:37:10 0 [Note] WSREP: wsrep_load(): Galera 4.7(ree4f10fc) by Codership Oy <info@codership.com> loaded successfully.
2021-05-20  7:37:10 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-05-20  7:37:10 0 [Note] WSREP: Found saved state: 8338b624-66eb-11eb-93e0-323a1dc8d4de:-1, safe_to_bootstrap: 1
2021-05-20  7:37:10 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 8338b624-66eb-11eb-93e0-323a1dc8d4de
Seqno: -1 - -1
Offset: -1
Synced: 0
2021-05-20  7:37:10 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 8338b624-66eb-11eb-93e0-323a1dc8d4de, offset: -1
2021-05-20  7:37:10 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2021-05-20  7:37:10 0 [ERROR] WSREP: std::bad_alloc
2021-05-20  7:37:10 0 [ERROR] WSREP: Failed to create a new provider '/usr/lib/galera/libgalera_smm.so' with options '': Failed to initialize wsrep provider
2021-05-20  7:37:10 0 [ERROR] WSREP: Failed to load provider
2021-05-20  7:37:10 0 [ERROR] Aborting

There is nothing we can do just to delete the disk for the node and let it fully resync. From linked tickets it seems the /var/lib/mysql/galera.cache got corrupted somehow and deleting it "solves" the issue.



 Comments   
Comment by Sergio Charrua [ 2021-05-28 ]

same issue here with 10.5.9, but with only 1 node (no Galera nodes running at the time)

https://jira.mariadb.org/browse/MDEV-24615

Comment by Alexey [ 2021-07-19 ]

Gcache ring buffer is not synced to disk and is not guaranteed to be consistent on restart. Hence some bogus numbers can be read from disk and care must be taken of potential exceptions. In particular here insert may throw exception for bogus seqno value. Fixed in commit 4bb58377cf3ee02e4c69ce329c2e099b07c79368

Comment by Alexey [ 2021-07-19 ]

Fixed in Galera commit 4bb58377cf3ee02e4c69ce329c2e099b07c79368

Generated at Thu Feb 08 09:39:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.