[MDEV-11035] Restore removed disallow-writes for Galera Created: 2016-10-12  Updated: 2018-04-08  Resolved: 2017-02-07

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2
Fix Version/s: 10.2.4

Type: Bug Priority: Major
Reporter: Jan Lindström (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File db-logs_2018-03-29.txt     Text File db-logs_2018-03-29_donor.txt     Text File sGAaCvfk.txt    
Issue Links:
Duplicate
duplicates MDEV-10949 innodb_disallow_writes does not work ... Closed
Relates
relates to MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoi... Closed
relates to MDEV-10949 innodb_disallow_writes does not work ... Closed
relates to MDEV-15770 We have three node galera cluster wit... Open

 Comments   
Comment by Jan Lindström (Inactive) [ 2016-10-12 ]

http://lists.askmonty.org/pipermail/commits/2016-October/009983.html

Is this enough for Galera ?

Comment by Nirbhay Choubey (Inactive) [ 2016-10-13 ]

jplindst This patch does not fix the problem. After applying your patch, I enabled @@global.innodb_disallow_writes and was still able to push a record to an innodb table. There is also a test that can be used to test this feature : galera.galera_var_innodb_disallow_writes.

Comment by Jan Lindström (Inactive) [ 2016-11-28 ]

http://lists.askmonty.org/pipermail/commits/2016-October/009989.html

Comment by Nirbhay Choubey (Inactive) [ 2017-01-27 ]

jplindst Look ok : http://lists.askmonty.org/pipermail/commits/2016-October/009989.html

Comment by Jan Lindström (Inactive) [ 2017-02-07 ]

commit 2aa47d9849e72cab1724b768f11aaa5d953c7153
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Tue Jan 31 12:25:25 2017 +0200

MDEV-11035: Restore removed disallow-writes for Galera

Galera disallow-writes feature was lost in InnoDB 5.7 merge
to 10.2. This patch restores this feature and fixes test
failure on test galera.galera_var_innodb_disallow_writes.

Comment by Marko Mäkelä [ 2018-03-29 ]

The fix instrumented the function os_file_write_page(), which was later renamed back to os_file_write_func(). It seems to cover both redo and data page writes.

While checking this, it seems to me that in MariaDB 10.1 (and presumably 10.0-galera), fil_io() invokes os_aio() which in turn can invoke os_aio_linux_dispatch() without any instrumentation. This would seem to imply that innodb_disallow_writes is broken if innodb_use_native_aio=1.

The parameter innodb_use_native_aio is enabled by default. However, if libaio is not available or not enabled at build time, the code would not be built. Maybe this happens to be the case with 10.0-galera and 10.1?

Comment by Tobias Genberg [ 2018-03-29 ]

After provoking a full sst sync, using rsync as transport. (rm -rf /var/lib/mysql/*) Im not able to sync again.

The sync fails with this in the logs:
Mar 29 14:04:59 db3.fraggelberget.nu sh[30134]: 2018-03-29 14:04:57 139896963414144 [Note] InnoDB: Starting final batch to recover 404 pages from redo log.
Mar 29 14:04:59 db3.fraggelberget.nu sh[30134]: 2018-03-29 14:04:57 139896963414144 [ERROR] [FATAL] InnoDB: Trying to read page number 722386 in space 56113, space name dynmap/Tiles, which is outside the tablespace bounds. Byte offset 0, len 16384
Mar 29 14:04:59 db3.fraggelberget.nu sh[30134]: 180329 14:04:57 [ERROR] mysqld got signal 6 ;

the Tiles-tablespace was the first time around 25G, and the 2nd time around 15G

innodb_use_native_aio=false in the config or innodb_use_native_aio=true does change anything.

check table returns OK on donor-node

full log hopefully attached

Comment by Tobias Genberg [ 2018-03-29 ]

unable to reproduce when switching transport to mariabackup.

Comment by Seppo Jaakola [ 2018-04-05 ]

Tobias, if you still have error logs from rsync SST failure, please attach longer history before the failure. I need to see how rsync joiner activities were carried out. So anything starting from SST request sending up to to the node failure would be needed.

Please attach also error logs from donor node as well, from matching time window.

Comment by Tobias Genberg [ 2018-04-06 ]

After Looking very closly to the logs again.
I Think this might be related to a timeout-event happening in Systemd during startup.

I have not been able to recreate this, since my db's are now in sync again, when I used mariabackup as transport.
But I have attached full logs from the node that fails and the donor from one point.

I hope it helps

// T

Comment by Marko Mäkelä [ 2018-04-08 ]

MDEV-14705 extends systemd timeouts on startup and shutdown. Could it solve this? Or would some other similar work be needed? I am not familiar with the details of the snapshot transfer procedure.

Generated at Thu Feb 08 07:46:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.