[MDEV-19084] Galera cluster gcache.page files creation at startup/restart Created: 2019-03-29  Updated: 2021-11-04  Resolved: 2021-01-13

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.13, 10.3.20
Fix Version/s: 10.3.25, 10.4.14

Type: Bug Priority: Critical
Reporter: Roope Pääkkönen (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 3
Labels: None
Environment:

Centos 7


Attachments: Text File wsrep-debug-log.txt    
Issue Links:
Relates
relates to MDEV-26968 Galera cluster gcache.page files crea... Closed
relates to MDEV-23060 Mariadb 10.4.12 gcache is created aft... Open

 Description   

We're experiencing following behavior seemingly randomly:

Mariadb 10.3 galera cluster with 3 nodes, restart one, and the node joins back to cluster normally (in less than a minute). But it immediately begins creating gcache.page.XXXXX file(s) after it's rejoined the cluster.

gcache size is 2 gigabytes and the db usage during the restart was quite small.
Afterwards, the same instance keeps creating new gcache.page.xxx files, while the other nodes do not.
And same thing when restarting other nodes, they rejoin normally and may or may not start to create gcache.page files right away.

What could be causing this kind of behaviour?

I believe the biggest "problem" this causes for us is that when a server node is using gcache.page files, it cannot serve IST to a joiner, but always does full SST, as I've understood.



 Comments   
Comment by Roope Pääkkönen (Inactive) [ 2019-04-09 ]

I googled around and found this issue from percona xtradb cluster, which has very similar description to what i'm seeing:
https://jira.percona.com/browse/PXC-887 & referenced same issue: https://github.com/codership/galera/issues/485

That is, it feels like the server is sometimes forgetting the actual gcache file on startup/restart and just uses page files.
In some cases the server was replicating around 100kB/s according to " wsrep_replicated_bytes " , but it was creating 100MB gcache.page files every ~10 seconds, which seemed excessive

Restarting the service once more usually fixes it, gcache.page files are no longer created

Comment by Roope Pääkkönen (Inactive) [ 2019-05-17 ]

wsrep-debug-log.txt I may have been able to reproduce this with sysbench locally; but not totally reliably.
With 3 vm's in virtualbox / Centos 7 / Mariadb 10.3.15

So far I've been able to make it happen after I reboot a vm. I'm not sure if that can be related in some way.

On one node: start this command: sysbench oltp_update_index --tables=10 --table-size=10000000 prepare
Noted that gcache.page files aren't created

Wait until sbtest1-table is about 500mb in size , then reboot the server ( sudo reboot ) while sysbench is running
After the server is rebooted, run drop table sbtest; create table sbtest; as required

And then run the same sysbench command again, now gcache.page files are created immediately.

I've attached mariadb log with wsrep_debug=on

After this, if I restart mariadb service, and begin the same sysbench again, gcache.page files are not created.

wsrep_provider_options="gcache.size=1G;gcache.recover=yes;pc.recovery=TRUE"

Comment by Roope Pääkkönen (Inactive) [ 2019-08-09 ]

Recently we've seen this on our servers very often whenever they are restarted for minor version upgrades.

I believe whenever lines similar to these are logged on startup:

[Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (         0/1073741272 bytes) complete.
[Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (1073741272/1073741272 bytes) complete.
[Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 0/1357407 locked buffers
[Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 1073741272/1073741824

The server begins to create gcache.page.XXX files immediately, irrespective of load or even if it doesn't receive any client connections. (e.g. nodes that are only used for fail-over do it as well)
Restarting the mariadb service once more usually stops these page files from being created (and the lines above don't appear in logs )

Comment by Mark Reibert [ 2020-06-17 ]

I have recently run into this following an upgrade from MariaDB 10.4.12 ⟶ 10.4.13. I performed a rolling upgrade of a three-node cluster, and because I did not know to look for this problem I am now wedged. The scenario is I upgraded node 1, and upon restart it began using the gcache.page files. So at this point it is no good as a donor because the single 128M gcache.page file is not large enough to store anything but a second of writesets for my busy cluster.

Then I upgraded node 2, and after receiving a IST from node 3 (the lone remaining "good" node) it too began using the gcache.page files. So now neither nodes 1 or 2 are effectively available as donors.

Again, because I didn't know this I then attempted to upgrade node 3, but of course I cannot get it to join the cluster because of the issue with nodes 1 and 2. Effectively, then, I am left with a two-node cluster where neither of the nodes can donate, so I am dead in the water. The only way to fix this is complete down time on the cluster (for non-rolling restarts).

Assuming the root cause is the same as discussed in https://jira.percona.com/browse/PXC-887, can that fix be ported to MariaDB? For those of us bitten by this it causes much pain.

Comment by Roope Pääkkönen (Inactive) [ 2020-06-17 ]

I just saw in the latest WSREP library patches from galera, there were fixes to gcache recovery - so maybe these might help here.

http://releases.galeracluster.com/galera-3/release-notes-galera-25.3.30.txt
http://releases.galeracluster.com/galera-4/release-notes-galera-26.4.5.txt

Comment by Mark Reibert [ 2020-08-10 ]

Still waiting for some kind of movement here.

Comment by Roope Pääkkönen (Inactive) [ 2020-11-28 ]

For me, it seems like after we updated Mariadb to 10.3.25 with the updated galera libraries, it has fixed the issue.

Comment by Alexey [ 2021-01-04 ]

Until Galera 3.30 there was a bug in GCache ring buffer recovery that could make most of the ring buffer unavailable, effectively making is tiny, and as a result GCache needed to allocate page files right from the start.

This is fixed in 3.30 and later, released May 2020

Comment by Mark Reibert [ 2021-01-04 ]

Yes, I do not believe I have encountered this issue since upgrading to MariaDB 10.4.14 (which brings along Galera 26.4.5).

Comment by Mark Reibert [ 2021-11-03 ]

I recently upgraded to Ubuntu 20.04 running MariaDB 10.4.21 / Galera 26.4.9 and this problem has resurfaced. So it looks like we have a regression.

I opened MDEV-26968 to address the regression since I do not appear to have the permission necessary to re-open this issue.

Generated at Thu Feb 08 08:48:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.